Back to : deep-learning-study
Contents

VGGNet์€ 2014 ImageNet Challenge์—์„œ 2์œ„์˜ ์„ฑ์ ์„ ๊ฑฐ๋‘” ๋ชจ๋ธ๋กœ, ๊นŠ์€ ๋„คํŠธ์›Œํฌ ๋ฅผ ์Œ“์•˜์„๋•Œ์˜ ํšจ์šฉ์„ ๋ณด์—ฌ์ค€ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์—ญ์‹œ AlexNet ํฌ์ŠคํŒ… ๋•Œ์ฒ˜๋Ÿผ ๋…ผ๋ฌธ์„ ๋”ฐ๋ผ๊ฐ€๋ฉด์„œ ๋ฉ”์ธ ์•„์ด๋””์–ด๋ฅผ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

Architecture

VGGNet์€ ๋ ˆ์ด์–ด ๊ฐœ์ˆ˜์— ๋”ฐ๋ผ 11, 13, 16, 19 ๋“ฑ์œผ๋กœ ๋‚˜๋ˆ„์–ด์ง‘๋‹ˆ๋‹ค. ์ด๋Š” ์‹ค์ œ๋กœ Convolution Layer๊ฐ€ 11๊ฐœ์—์„œ 19๊ฐœ๊นŒ์ง€ ์žˆ๋Š” ๋ชจ๋ธ๋กœ, ํฌ๊ฒŒ๋Š” (Conv-ReLU)-(Conv-ReLU)-(maxpool) ์„ ๋ฐ˜๋ณตํ•˜๋Š” ์‹์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. drawing

์ด ๊ทธ๋ฆผ์€ VGG-13์˜ ๊ทธ๋ฆผ์ธ๋ฐ, ์ •ํ™•ํ•œ 11, 13, 16, 19์˜ ๊ตฌ์กฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์ค‘๊ฐ„์— ReLU๊ฐ€ ๋งค conv ๋’ค์— ๋“ค์–ด๊ฐ€์žˆ์ง€๋งŒ ํ‘œ์‹œ๋Š” ์•ˆ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

drawing

๋‹ค์–‘ํ•œ VGGNet์˜ ๋ฒ„์ „๋“ค์€ ํฐ ํ‹€์—์„œ๋Š” ๋น„์Šทํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋‹ค์Œ์— Pytorch ๊ตฌํ˜„ ์ฝ”๋“œ๋ฅผ ๋ณผ ๋•Œ ๋‹ค์‹œ ์‚ดํŽด๋ณผ ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

VGGNet ๋…ผ๋ฌธ์˜ Contribution ๋‘ ๊ฐ€์ง€๋ฅผ ์ •๋ฆฌํ•˜์ž๋ฉด,

  1. CNN์„ ๊นŠ์ด ์Œ“์„ ์ˆ˜ ์žˆ๋‹ค๋ฉด ์œ ์˜๋ฏธํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ์žˆ๋‹ค ๋Š” ๊ฒƒ์„ ๋ณด์˜€๊ณ 
  2. ์ž˜ ํ›ˆ๋ จํ•˜๋ฉด ์ด๋ ‡๊ฒŒ ๊นŠ์€ CNN๋„ ํ›ˆ๋ จํ•  ์ˆ˜ ์žˆ๋‹ค ๋ผ๋Š” ๊ฒƒ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

๊ฐ Contribution์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

Very Deep CNN

VGGNet์€ ๋ฉ”์ธ์œผ๋กœ 3 by 3 convolution๋งŒ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ์ด์œ ๋ฅผ ์ €์ž๋“ค์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

  • 7 by 7 convolution ํ•˜๋‚˜๋ฅผ ์“ฐ๋Š” ๊ฒƒ๊ณผ, 3 by 3 convolution์„ ์„ธ๋ฒˆ ์“ฐ๋Š” ๊ฒƒ์€ ๊ฐ™์€ receptive field๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค. (์ฆ‰, ์ž…๋ ฅ์˜ ๊ฐ™์€ ์˜์—ญ์ด ๋ฐ˜์˜๋ฉ๋‹ˆ๋‹ค) ์ด๋Š” ์ง์ ‘ receptive field๋ฅผ ํ™•์ธํ•ด๋ณด๋ฉด ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฐ๋ฐ, ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๊ฐœ์ˆ˜๋Š” 49 : 27๋กœ ๋Œ€๋žต ์ ˆ๋ฐ˜์„ ๊นŒ์ง€ ์ค„์–ด๋“ญ๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋ฉด์„œ, 3๋ฒˆ์˜ ๋ ˆ์ด์–ด ์‚ฌ์ด์‚ฌ์ด์—๋Š” ReLU๊ฐ€ ๋“ค์–ด๊ฐ€๊ธฐ ๋•Œ๋ฌธ์— ์ „์ฒด ํ•จ์ˆ˜๊ฐ€ ๋” non-linearํ•ด์ง‘๋‹ˆ๋‹ค.
  • ๊ฒฝ์šฐ์— ๋”ฐ๋ผ 1 by 1 convolution๋„ ์“ฐ๊ธด ํ•˜๋Š”๋ฐ (VGG16), ์ด๊ฒƒ๋„ ์ ์€ ์ˆ˜์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ์œผ๋กœ nonlinearity๋ฅผ ์ถ”๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด์„œ์ž…๋‹ˆ๋‹ค.

๊ทธ์™ธ์—, Convolution์„ ๊นŠ๊ฒŒ ๋“ค์–ด๊ฐ€์„œ ๋งˆ์ง€๋ง‰์— Fully Connected Layer๋กœ ์‹ค์ œ classification์„ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์€ AlexNet๊ณผ ๋™์ผํ•ฉ๋‹ˆ๋‹ค.

Training

Regularization

  • Regularization์œผ๋กœ๋Š” ์ ์ ˆํ•œ data augmentation๊ณผ ํ•จ๊ป˜, $5 \times 10^{-4}$ ์˜ weight decay๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • Fully connected layer์—์„œ๋Š” $p = 0.5$๋กœ dropout๋„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.

Optimization

  • SGD with Momentum. Initial LR 0.01, Momentum 0.9
  • Validation accuracy๊ฐ€ ์ •์ฒด๋˜๋ฉด LR์„ 1/10์œผ๋กœ ๊นŽ๋Š” ์‹์œผ๋กœ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

Initialization

VGGNet์˜ ์žฌ๋ฐŒ๋Š” ์  ์ค‘ ํ•˜๋‚˜๋Š”, training ๊ณผ์ •์—์„œ ๋„ˆ๋ฌด ๊นŠ์€ 19๋ ˆ์ด์–ด CNN์„ ํ•œ๋ฒˆ์— trainingํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์—, 11-Layer ๋ฒ„์ „์„ ํ›ˆ๋ จํ•œ ํ›„ ์ด ์œ„์—์„œ ๋ ˆ์ด์–ด๋ฅผ ํ•˜๋‚˜์”ฉ ์ถ”๊ฐ€ํ•ด๊ฐ€๋ฉด์„œ trainingํ–ˆ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๋”ฐ์ ธ๋ณด๋ฉด, pre-train ๋œ ๋ชจ๋ธ์„ ์ด์šฉํ•˜์—ฌ initialize๋ฅผ ์ž˜ ํ•˜๋Š” ๋А๋‚Œ์œผ๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฒฐ๊ณผ์™€ ์˜์˜

VGGNet์€ 2014 Imagenet Challenge์—์„œ top-5 error 6.8% ์ •๋„๋กœ, AlexNet์˜ 16%์— ๋น„ํ•ด ํฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ด๋ฃจ์—ˆ์Šต๋‹ˆ๋‹ค.

  • 2014๋…„ 1์œ„์˜€๋˜ GoogLeNet์ด 6.7%๋กœ ๊ฑฐ์˜ ๋˜‘๊ฐ™์€ ์ •ํ™•๋„๋ฅผ ๊ฐ–๋Š”๋ฐ, ์ด๋Š” ๋‚˜์ค‘์— ํฌ์ŠคํŒ…ํ•˜๊ฒ ์ง€๋งŒ VGG๋ณด๋‹ค ํ›จ์”ฌ ๋ณต์žกํ•œ ๋‚ด๋ถ€ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋ž˜์„œ, ์ดํ›„ ๋”ฅ๋Ÿฌ๋‹ ์—ฐ๊ตฌ์—์„œ ์‹คํ—˜์ด๋‚˜ ๋‹ค๋ฅธ ์šฉ๋„๋กœ ์“ฐ๊ธฐ์— VGGNet์ด ์ข€๋” ์šฉ์ดํ•ฉ๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์œผ๋กœ, Semantic Segmentation ๋ชจ๋ธ์ค‘ ํ•˜๋‚˜์ธ FCN์€ VGGNet์˜ ๊ตฌ์กฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€์˜ ํŠน์ง•์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • Initialization์„ ํ†ตํ•ด ํ›ˆ๋ จ์„ ๋” ์ž˜ ํ•˜๋Š” ๋ฐฉ๋ฒ•, ํฐ convolution ํ•˜๋‚˜๋ณด๋‹ค ์ž‘์€ convolution ์—ฌ๋Ÿฌ๊ฐœ๋ฅผ ์“ฐ๋Š” ๊ฒƒ์˜ ์ด์  ๋“ฑ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.