[๋…ผ๋ฌธ์ •๋ฆฌ๐Ÿ“ƒ] Learning Transferable Architectures for Scalable Imageย Recognition

Learning Transferable Architectures for Scalable Imageย Recognition

- NASNet -

๋…ผ๋ฌธ์›๋ณธ๐Ÿ˜™


1. Introduction

์ด ๋…ผ๋ฌธ์—์„œ๋Š” convolution ๊ตฌ์กฐ๋ฅผ ๋””์ž์ธํ•˜๊ณ  ๋ฐ์ดํ„ฐ์…‹์˜ ๊ตฌ์กฐ๋ฅผ ์ตœ์ ํ™”์‹œํ‚ค๊ธฐ์œ„ํ•œ ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์ธ NAS framework๋ฅผ ์ œ์‹œํ•œ๋‹ค.

NAS framework

๊ฐ•ํ™”ํ•™์Šต์„ ์‚ฌ์šฉํ•ด ๊ตฌ์กฐ๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ


NASNet ํŠน์ง•

  • ๊ธฐ์กด์— ์‚ฌ๋žŒ์ด conv block์„ ๋งŒ๋“ค์—ˆ๋‹ค๋ฉด, NASNet์€ ๊ฐ•ํ™”ํ•™์Šต๊ณผ RNN์„ ํ™œ์šฉํ•ด block์„ ์„ค๊ณ„ํ•˜์˜€๋‹ค. ๐Ÿ™Œ๐Ÿป๐Ÿ™Œ๐Ÿป
  • NAS ๋ฅผ ์ฐธ๊ณ ํ•˜์˜€์ง€๋งŒ ์ฐจ์ด์ ์ด ์žˆ๋Š”๋ฐ,

    NAS : ์ „์ฒด ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ search space๋กœ ์„ค์ • NASNet : ๋ชจ๋ธ์„ ๊ตฌ์„ฑํ•˜๋Š” convolutional layer ์—ฐ์‚ฐ์„ search space๋กœ ์„ค์ •ํ•˜์˜€๋‹ค. ์ด๋ ‡๊ฒŒ ๋˜๋ฉด Search space์— ์žˆ๋Š” ๋ชจ๋“  convolutional networks๋Š” weight๋งŒ ๋‹ค๋ฅด๊ณ  ๊ตฌ์กฐ๋Š” ๋™์ผํ•œ convolutional layers๋ฅผ ๊ฐ€์ง„๋‹ค.

์ด๋ ‡๊ฒŒ ์ตœ์ ์˜ cell ๊ตฌ์กฐ๋งŒ ์ฐพ์•„๋‚ด๋ฉด ๋˜๋Š”๊ฐ„๋‹จํ•œ ๋ฌธ์ œ๊ฐ€ ๋˜๋ฉฐ, ์ด ๋ฐฉ์‹์€ 2๊ฐ€์ง€ ์žฅ์ ์ด ์žˆ๋‹ค.

(1) ์ „์ฒด ๋„คํŠธ์›Œํฌ๊ตฌ์กฐ๋ฅผ ์ฐพ๋Š” ๊ฒƒ ๋ณด๋‹ค ๋น ๋ฅด๋‹ค. (NAS๋ณด๋‹ค ์•ฝ 7๋ฐฐ ๋น ๋ฅด๋‹ค๊ณ  ํ•จ)

(2) cell์ด ๋‹ค๋ฅธ ๋ฌธ์ œ๋“ค์—๋„ ์ž˜ ์ผ๋ฐ˜ํ™” ๋œ๋‹ค. (๋‹ค๋ฅธ ์ด๋ฏธ์ง€ ํฌ๊ธฐ์˜ ๊ตฌ์กฐ์—๋„ ์ ํ•ฉํ•จ, cifar-10์—์„œ ํ•™์Šตํ•œ ๋ชจ๋ธ์„ imagenet์— ์ „์ด์‹œ์ผœ๋„ sota์˜ ์ •ํ™•๋„๋ฅผ ์œ ์ง€ํ•œ๋‹ค.)

์ด ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

โ€œCIFAR-10์—์„œ NASNet(์ตœ์ ์˜๊ตฌ์กฐ)๋ฅผ ์ฐพ์•„๋‚ด์—ˆ๊ณ , ํฐ ๋ณ€๊ฒฝ์—†์ด ImageNet์— ์ „์ด์‹œ์ผœ *SOTA ์ •ํ™•๋„๋ฅผ ๊ฐ€์ ธ์™”๋‹ค.โ€

*State-Of-The-Art


  • ์ด ๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•œ ๋ฐฉ๋ฒ•์€ ์ด์ „์˜ ๋ฐฉ์‹์ธ hyperparameter optimization๊ณผ ๊ด€๋ จ์ด ์žˆ์œผ๋ฉฐ, ํŠนํžˆ Neural Fabrics, DiffRNN, MetaQNN, DeepArchitect์™€ ๊ฐ™์€ ์ตœ๊ทผ์˜ ์•„ํ‚คํ…์ณ๋ฅผ ๊ณ ์•ˆํ•˜๋Š” ์ ‘๊ทผ๋ฐฉ์‹์— ๊ด€๋ จ์ด ์žˆ๋‹ค.

  • Evolutionary Algorithms๋„ ๊ตฌ์กฐ ์„ค๊ณ„์™€ ๊ด€๋ จ์ด ์žˆ์ง€๋งŒ large scale์—์„œ๋Š” ๊ทธ๋‹ค์ง€ ์ข‹์€ ๊ฒฐ๊ณผ๋Š” ์—†์Œ

  • ๋‹ค๋ฅธ neural network ์™€ interact ์‹œํ‚ค๊ฑฐ๋‚˜ metadata๋กœ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์ตœ๊ทผ ์ฃผ๋ชฉ๋ฐ›๊ณ  ์žˆ์œผ๋‚˜ ๋Œ€๋ถ€๋ถ„์€ ImageNet๊ณผ ๊ฐ™์€ large scale๋ฐ์ดํ„ฐ์—์„œ ์ ์šฉํ•˜์ง€ ์•Š์Œ

  • Search space์˜ ์„ค๊ณ„๋Š” LSTM๊ณผ Neural Architecture Search Cell์—์„œ ์˜๊ฐ์„ ๋ฐ›์Œ

  • VGG, Inception, ResNet/ResNext, Xception/MobileNet์€ convolutional cell์˜ ๋ชจ๋“ˆ๋Ÿฌ ๊ตฌ์กฐ์™€ ๊ด€๋ จ์ด ์žˆ์Œ.


3. Method

3-1. NAS Overview

image

์œ„ ๊ทธ๋ฆผ์€ NAS์˜ ์ „์ฒด์ ์ธ ๋ชจ์Šต์„ ๋ณด์—ฌ์ค€๋‹ค.

๋จผ์ €, Search Space์— ์žˆ๋Š” ํ™•๋ฅ ๊ฐ’ p๋กœ ๋ถ€ํ„ฐ ์•„ํ‚คํ…์ณ(์ƒ˜ํ”Œ๋ชจ๋ธ A)๋ฅผ ์˜ˆ์ธกํ•˜๋Š”๋ฐ, ์ด๋•Œ A๋Š” ํŠน์ • Validation set์— ๋Œ€ํ•ด R ์ •ํ™•๋„๋ฅผ ๊ฐ€์ง„๋‹ค. ์ด accuracy๊ฐ’์€ controller๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋œ๋‹ค. ๊ทธ๋Ÿผ์œผ๋กœ์จ controller๋Š” ๋งค ์ˆœ๊ฐ„ ๋” ์ข‹์€ ๊ตฌ์กฐ๋ฅผ ์ƒ์„ฑํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.


3-2 Normal cell, Reduction cell

image

๋ชจ๋“  image size์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๊ตฌ์กฐ๋ฅผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด feature map์„ input์œผ๋กœ ๊ฐ€์ ธ์˜ฌ ๋•Œ ์ค‘์š”ํ•œ ์ผ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋‘ convolutional cell์ด ํ•„์š”ํ•˜๋‹ค.

1) Normal cell : ๊ฐ™์€ ์ฐจ์›์˜ feature map ์œผ๋กœ ๋ฐ˜ํ™˜

2) Reduction cell : ๋†’์ด์™€ ๋„ˆ๋น„๋ฅผ ยฝ feature map์œผ๋กœ ๋ฐ˜ํ™˜


Block, Controller RNN

์•„๋ž˜ ๊ทธ๋ฆผ์€ ์ „์ฒด์ ์ธ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ์—์„œ ๊ฐ€์žฅ ์ž‘์€ ๋‹จ์œ„์ธ Block์ด๋‹ค. block์€ 2๊ฐœ์˜ hidden input์„ ๋ฐ›๊ณ  2๋ฒˆ์˜ operation์„ ์ˆ˜ํ–‰ํ•˜๊ณ  1๋ฒˆ์˜ combine operation์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

image

cell ์ฐพ๋Š” ๋ฐฉ๋ฒ• ๐Ÿ˜บ

Step 1: hi, hi-1๋กœ๋ถ€ํ„ฐ hidden state ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•œ๋‹ค (hi๋Š” ํ˜„์žฌ block์˜ hidden state, hi-1์€ ์ด์ „ ๋ธ”๋ฝ์—์„œ ์ƒ์„ฑ๋œ hidden state๋ฅผ ์˜๋ฏธ)

Step 2: Step 1๊ณผ ๋™์ผํ•˜๊ฒŒ ๋‘ ๋ฒˆ์งธ hidden state ์„ ํƒ

Step 3: Step 1์—์„œ ์„ ํƒ๋œ hidden state์— ์ ์šฉํ•  ์—ฐ์‚ฐ์„ ์„ ํƒ

Step 4: Step 2์—์„œ ์„ ํƒ๋œ hidden state์— ์ ์šฉํ•  ์—ฐ์‚ฐ์„ ์•„๋ž˜์—์„œ ์„ ํƒํ•œ๋‹ค

image

Step 5: ์ƒˆ๋กœ์šด hidden state๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด Step3๊ณผ Step4์˜ ์ถœ๋ ฅ ๊ฐ’์„ ๊ฒฐํ•ฉํ•  ๋ฐฉ๋ฒ•์„ ์„ ํƒ

๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ (1) element-wise addition, (2) concatenation ๋‘ ๊ฐ€์ง€ ์ค‘ ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•œ๋‹ค

์ด๋ ‡๊ฒŒ Step1 ~ Step5 ๊ณผ์ •์„ ํ†ตํ•ด ํ•˜๋‚˜์˜ block์ด ์ƒ์„ฑ๋œ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ์ด ํ”„๋กœ์„ธ์Šค๋ฅผ 5๋ฒˆ ๋ฐ˜๋ณตํ•ด 5๊ฐœ์˜ ๋ธ”๋ฝ์„ ์ƒ์„ฑํ•˜๊ณ , 5๊ฐœ์˜ block์œผ๋กœ ํ•˜๋‚˜์˜ Cell์„ ์ƒ์„ฑํ•˜๊ฒŒ ๋œ๋‹ค.

์ด ํ”„๋กœ์„ธ์Šค๋กœ Reduction cell๊ณผ Normal Cell์„ ์ƒ์„ฑํ•ด์•ผ ํ•˜๋ฏ€๋กœ RNN์˜ ๊ฐ ๋ ˆ์ด์–ด๋Š” 2x5B Soft max prediction์„ ํ•œ๋‹ค.
(์ฒ˜์Œ 5B predictions ์€ Normal cell, ๋‘ ๋ฒˆ์งธ 5B prediction์€ Reduction Cell ์„ ์œ„ํ•จ)


Architecture ์ „๊ฐœ ๊ณผ์ •

image

์œ„ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด B๋ฒˆ ๋ฐ˜๋ณตํ•œ block์ด ํ•˜๋‚˜์˜ cell์„ ์ด๋ฃจ๊ฒŒ ๋˜๊ณ , normal cell๊ณผ reduction cell์ด ๋ฐ˜๋ณตํ•˜๋ฉด์„œ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๊ฐ€ ํ˜•์„ฑ๋œ๋‹ค.


Network Architecture

image

  • imageNet์˜ image size๋Š” 299x299๋กœ 32x32 ์ธ CIFAR-10์˜ ๊ตฌ์กฐ๋ณด๋‹ค ํฌ๊ธฐ ๋•Œ๋ฌธ์— reduction cell ์ด ๋” ๋งŽ๋‹ค.

  • ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ฌธ์ œ์˜ scale์„ ๋งž์ถ”๊ธฐ ์œ„ํ•ด ๋ฐ˜๋ณต ํšŸ์ˆ˜ N๊ณผ initial convolutional filter์˜ ์ˆ˜๋ฅผ free parameter๋กœ ๋‘”๋‹ค.

  • ๊ตฌ์„ฑ๋œ network๋ฅผ ํ† ๋Œ€๋กœ training data๋กœ ํ•™์Šต์„ ์‹œํ‚จ ๋’ค validation accuracy๋ฅผ ์ธก์ •ํ•ด reward๋กœ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ•ํ™”ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋œ๋‹ค.


์ด๋ฒˆ ์ฝ”๋“œ๋Š” ์•„๋ฌด๋ž˜๋„ ์ปดํ“จํ„ฐ๊ฐ€ ์ง์ ‘ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋ฉด์„œ ํ•™์Šตํ•˜๋Š” ๋ฐ๋‹ค๊ฐ€ GPU 500์žฅ์œผ๋กœ 4์ผ์„ ๋Œ๋ ค์„œ ๋‚˜์˜ค๋Š” ๋ชจ๋ธ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋ธ๊ตฌํ˜„์€ ๊ฑด๋„ˆ๋›ฐ์—ˆ๋‹ค.

๊ทธ๋ž˜์„œ ๋…ผ๋ฌธ์— ๋‚˜์˜จ ์ด๋ฏธ์ง€๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ reduction cell ๊ณผ normal cell์„ ํ† ๋Œ€๋กœ ์ž„์˜๋กœ ๊ตฌํ˜„์„ ํ•ด์„œ ๋Œ๋ ธ์ง€๋งŒ ํ…Œ์ŠคํŠธ acc๋Š” 64% ์ •๋„๊ฐ€ ๋‚˜์™”๋‹ค. ใ…Žใ…Ž ๋””๋ฒ„๊น…์„ ์ข€ ๋” ํ•ด๋ณด๊ณ  ์ฝ”๋“œ๋ฅผ ์ •๋ฆฌํ•ด์„œ ์„ฑ๋Šฅ์„ ์˜ฌ๋ฆฌ๋„๋ก ํ•ด์•ผ๊ฒ ๋‹ค.


์ฐธ๊ณ 

[1] https://hoya012.github.io/blog/Learning-Transferable-Architectures-for-Scalable-Image-Recognition-Review/

[2] https://deep-learning-study.tistory.com/543

ํƒœ๊ทธ: ,

์นดํ…Œ๊ณ ๋ฆฌ:

์—…๋ฐ์ดํŠธ:

๋Œ“๊ธ€๋‚จ๊ธฐ๊ธฐ