[๋…ผ๋ฌธ์ •๋ฆฌ๐Ÿ“ƒ] Very Deep Convolutional Networks For Large-Scale Image Recognition

Very Deep Convolutional Networks For Large-Scale Image Recognition

- VGG16 -

๋…ผ๋ฌธ์›๋ณธ๐Ÿ˜™

๋…ผ๋ฌธ ๋‚ด์šฉ๋„ ๋‹จ์ˆœํ•˜๊ณ  ๋ ˆ์ด์–ด ๊ตฌ์กฐ๋„ ์ „์— ๋‚˜์˜จ ๊ตฌ์กฐ๋“ค๋ณด๋‹ค ์‹ฌํ”Œํ–ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์„ฑ๋Šฅ์ด ๋” ์ข‹๋‹ค๋‹ˆ ์‹ ๊ธฐํ•˜๋ฉด์„œ๋„ ์˜ค๋Š˜๋„ ์—๋Ÿฌ๋ฅผ ๊ณ ์น˜๋ฉฐ ๋‚ด ํ•™์Šต์€ ์™œ ์•ˆ๋˜๋Š”์ง€ ์˜๋ฌธ..ใ…Žใ…Ž

๋…ผ๋ฌธ์—์„œ ๊ฐ•์กฐํ•˜๋Š”๊ฒƒ์€ ๋”ฑ 2๊ฐ€์ง€์ด๋‹ค 3x3์œผ๋กœ ์ž‘์€ receptive field๋ฅผ ์‚ฌ์šฉํ•˜์˜€๊ณ  layer depth ๊ฐ€ ๊นŠ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค! ์งš๊ณ  ๋„˜์–ด๊ฐ€๊ธฐ!

VGG-16 ์ฝ”๋“œ๊ตฌํ˜„ ํŽ˜์ด์ง€. => VGG-16

1. ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋Œ€๊ทœ๋ชจ ์ด๋ฏธ์ง€ ์ธ์‹ํ™˜๊ฒฝ์—์„œ์˜ CNN ๊นŠ์ด์— ๋”ฐ๋ฅธ ์ •ํ™•๋„์— ๋Œ€ํ•ด ์—ฐ๊ตฌํ•˜์˜€๋‹ค.

์ด ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ์€

(1) 3x3 ์˜ ์ž‘์€ receptive field

(2) 16~19 layers ์˜ ๊นŠ์€ ๊ตฌ์กฐ

์„ ์‚ฌ์šฉํ•จ์—๋„ ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ๊ฒƒ์ด๋‹ค.

2. architecture

image

Vgg์˜ ๊ตฌ์กฐ๋Š” ํ‘œ์™€ ๊ฐ™์ด 11, 13, 16, 19 layer์— ๋”ฐ๋ฅธ ๋ชจ๋ธ๋“ค์ด ์žˆ๋‹ค. (๋‚ด๊ฐ€ ๊ตฌํ˜„ํ•œ ๊ฒƒ์€ D์˜ VGG-16 ์ด๋‹ค.)

์—ฌ๊ธฐ์„œ ์ฃผ๋ชฉํ•  2๊ฐ€์ง€ ํ•ต์‹ฌ๋‚ด์šฉ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

7x7 Convolution layer ๋Œ€์‹  3x3 Convolution layer์‚ฌ์šฉํ•œ ์ด์œ !

3x3 Convolutional layer์„ 3๊ฐœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ 7x7 Convolutional layer์„ 1๊ฐœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์ง€๋งŒ, ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๊ฐ€ ์ ๋‹ค

ย ย ย ย ย ย ย ย ย ย ย $3(3^2C^2)<(7^2C^2)$

ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜ ๊ณ„์‚ฐํ•ด๋ณด๋‹ˆ 3X3์„ ์„ธ๋ฒˆ ์—ฐ์‚ฐํ•œ ๊ฐ’์ด 7X7 ํ•œ ๋ฒˆ ๋ณด๋‹ค ์ ์€ ๊ฒƒ์„ ํ™•์ธ!

1x1 Convolutional Layer๋ฅผ ํฌํ•จ์‹œํ‚ค๋Š” ์ด์œ !

Convolutional Layer์˜ receptive field๋ฅผ ๊ฑด๋“ค์ง€ ์•Š๊ณ ๋„ decision function์˜ ๋น„์„ ํ˜•์„ฑ์„ ์ฆ๊ฐ€์‹œํ‚ค๊ธฐ ์œ„ํ•จ.

3. ์„ฑ๋Šฅ

image

ํƒœ๊ทธ: ,

์นดํ…Œ๊ณ ๋ฆฌ:

์—…๋ฐ์ดํŠธ:

๋Œ“๊ธ€๋‚จ๊ธฐ๊ธฐ