[ํŒŒ์ดํ† ์น˜] ํŒŒ์ดํ† ์น˜๋กœ CNN ๋ชจ๋ธ์„ ๊ตฌํ˜„ํ•ด๋ณด์ž! (GoogleNetํŽธ)

Posted by Euisuk's Dev Log on November 28, 2021

[ํŒŒ์ดํ† ์น˜] ํŒŒ์ดํ† ์น˜๋กœ CNN ๋ชจ๋ธ์„ ๊ตฌํ˜„ํ•ด๋ณด์ž! (GoogleNetํŽธ)

์›๋ณธ ๊ฒŒ์‹œ๊ธ€: https://velog.io/@euisuk-chung/ํŒŒ์ดํ† ์น˜-ํŒŒ์ดํ† ์น˜๋กœ-CNN-๋ชจ๋ธ์„-๊ตฌํ˜„ํ•ด๋ณด์ž-GoogleNetํŽธ

์•ˆ๋…•ํ•˜์„ธ์š”! ์ง€๋‚œ๋ฒˆ ํฌ์ŠคํŠธ์ธ VGGNet ์ดํ›„๋กœ ์˜ค๋Š˜์€ GoogleNet ๊ด€๋ จ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ๋‹ค์Œ ํฌ์ŠคํŠธ๋Š” ResNet์œผ๋กœ ์ฐพ์•„๋ต™๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์ง€๋‚œ๋ฒˆ์—๋„ ์†Œ๊ฐœ๋“œ๋ ธ๋‹ค์‹œํ”ผ ์ปดํ“จํ„ฐ ๋น„์ „ ๋Œ€ํšŒ ์ค‘์— ILSVRC (Imagenet Large Scale Visual Recognition Challenges)์ด๋ผ๋Š” ๋Œ€ํšŒ๊ฐ€ ์žˆ๋Š”๋ฐ, ๋ณธ ๋Œ€ํšŒ๋Š” ๊ฑฐ๋Œ€ ์ด๋ฏธ์ง€๋ฅผ 1000๊ฐœ์˜ ์„œ๋ธŒ์ด๋ฏธ์ง€๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ ๊ทธ๋ฆผ์€ CNN๊ตฌ์กฐ์˜ ๋Œ€์ค‘ํ™”๋ฅผ ์ด๋Œ์—ˆ๋˜ ์ดˆ์ฐฝ๊ธฐ ๋ชจ๋ธ๋“ค๋กœ AlexNet (2012) - VGGNet (2014) - GoogleNet (2014) - ResNet (2015) ์ˆœ์œผ๋กœ ๊ณ„๋ณด๋ฅผ ์ด์–ด๋‚˜๊ฐ”์Šต๋‹ˆ๋‹ค.

ILSVRC

Source : https://icml.cc/2016/tutorials/

์œ„์˜ ๊ทธ๋ฆผ์—์„œ layers๋Š” CNN layer์˜ ๊ฐœ์ˆ˜(๊นŠ์ด)๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ ์ง๊ด€์ ์ธ ์ดํ•ด๋ฅผ ์œ„ํ•ด์„œ ์•„๋ž˜์ฒ˜๋Ÿผ ๊ทธ๋ฆผ์„ ๊ทธ๋ ค๋ณด์•˜์Šต๋‹ˆ๋‹ค.

Depth Comp

GoogleNet ๊ฐœ์š”

์†Œ๊ฐœ

GoogleNet์ด ์†Œ๊ฐœ๋œ ๋…ผ๋ฌธ์˜ ์ œ๋ชฉ์€ Going Deeper with Convolutions๋กœ, ๋‹ค์Œ ๋งํฌ์—์„œ ํ™•์ธํ•ด๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งํฌ

์ธ์šฉ

์ผ๋ฐ˜์ ์œผ๋กœ ๋ชจ๋ธ์˜ depth์™€ width๊ฐ€ ์ปค์ง€๋ฉด parameter๊ฐ€ ๋งŽ์•„์ง€๊ฒŒ ๋˜๊ณ , ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๊ฐ€ ๋งŽ๊ฒŒ๋˜๋ฉด ๋ชจ๋ธ์ด Overfitting๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. GoogleNet์€ ์ตœ๋Œ€ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ค„์ด๋ฉด์„œ ๋„คํŠธ์›Œํฌ๋ฅผ ๊นŠ๊ฒŒ ๋””์ž์ธํ•˜๊ณ ์ž ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์˜ ๋ ˆ์ด์–ด๊ฐ€ ๊นŠ๋”๋ผ๋„ ์—ฐ๊ฒฐ์ด sparseํ•˜๋‹ค๋ฉด ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๊ฐ€ ์ค„์–ด๋“ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” Overfitting ์„ ๋ฐฉ์ง€ํ•˜๋Š” ํšจ๊ณผ๊ฐ€ ์žˆ์œผ๋ฉด์„œ, ์—ฐ์‚ฐ ์ž์ฒด๋Š” Denseํ•˜๊ฒŒ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

GoogleNet์€ ์ธ์…‰์…˜(inception) ๋ชจ๋“ˆ์ด๋ผ๋Š” ๋ธ”๋ก์„ ๊ฐ€์ง€๊ณ  ์žˆ์–ด์„œ ์ธ์…‰์…˜ ๋„คํŠธ์›Œํฌ๋ผ๊ณ ๋„ ๋ถˆ๋ฆฝ๋‹ˆ๋‹ค. ์•„๋ž˜ ๊ทธ๋ฆผ์€ GoogleNet์˜ ๊ตฌ์กฐ๋„ ์ž…๋‹ˆ๋‹ค. ์ด์ „ ๋ชจ๋ธ๋“ค๊ณผ ๋น„๊ตํ•˜์—ฌ ํ›จ์”ฌ ๋ณต์žกํ•œ ๊ตฌ์กฐ๋กœ ๋˜์–ด ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

GoogleNet๊ตฌ์กฐ๋„

Source : https://arxiv.org/abs/1409.4842

Inception Module

์œ„์˜ ๊ตฌ์กฐ๋„๋ฅผ ์‚ดํŽด๋ณด๋ฉด ๋ญ”๊ฐ€ ์—ฌ๋Ÿฌ๊ฐˆ๋ž˜๋กœ ๊ฐˆ๋ผ์กŒ๋‹ค๊ฐ€ ๋ชจ์ด๋Š” ํ˜•ํƒœ๋ฅผ ํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡ํ•œ ๋ชจ๋“ˆ(๋ธ”๋ก)์„ ์ธ์…‰์…˜(Inception) ๋ชจ๋“ˆ์ด๋ผ๊ณ  ํ•˜๋ฉฐ ์ด๋Š” ์•„๋ž˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Inception

์ธ์…‰์…˜ ๋ชจ๋“ˆ์€ Feature๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด 1x1, 3x3, 5x5 convolution ์—ฐ์‚ฐ์„ ๊ฐ๊ฐ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ฐ€์žฅ ๋จผ์ € ์ขŒ์ธก์— ์žˆ๋Š” (a) Naive Inception์— ๋Œ€ํ•˜์—ฌ ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

(a) Naive Inception

Naive Inception์€ ์ด์ „ ๋‹จ๊ณ„์˜ Activation Map์— ๋Œ€ํ•˜์—ฌ 1x1, 3x3, 5x5ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•œ ๋’ค ๊ฐ๊ฐ์˜ ๊ฒฐ๊ณผ๋ฅผ concatenateํ•ด์ฃผ์–ด ๋‹ค์Œ ๋‹จ๊ณ„๋กœ ๋„˜์–ด๊ฐ‘๋‹ˆ๋‹ค. ์ด๋Š” ์•™์ƒ๋ธ”์˜ ํšจ๊ณผ๋ฅผ ๊ฐ–๊ณ  ์žˆ์–ด ๋ชจ๋ธ์ด ๋‹ค์–‘ํ•œ ๊ด€์ ์œผ๋กœ ํ•™์Šต์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ํšจ๊ณผ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์‹œ๊ฐํ™”๋ฅผ ํ†ตํ•ด ์–ด๋–ค ์‹์œผ๋กœ ๊ฐ๊ฐ Naive Inception๋ชจ๋“ˆ์˜ convolution์—ฐ์‚ฐ์ด ์ˆ˜ํ–‰๋˜๊ณ  ํ•ฉ์ณ์ง€๋Š”์ง€ ์•„๋ž˜ ๋„์‹œํ™”ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค.

Naive Inception

์ด๋•Œ ๊ทธ๋ฆผ์„ ํ†ตํ•ด ํ™•์ธํ•  ์ˆ˜ ์žˆ๋Š” ํŠน์ง•๋“ค์ด ๋ช‡๊ฐ€์ง€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

  • 1x1 convolution์€ ๋ณ„๋„์˜ padding์ด ์—†์Šต๋‹ˆ๋‹ค.
  • 3x3 convolution, 5x5 convolution, 3x3 Pooling ํ›„ concatenate ํ•˜๊ธฐ ์œ„ํ•ด ํฌ๊ธฐ๊ฐ€ ๊ฐ™์•„์ง€๋„๋ก ์ ๋‹นํ•œ padding์„ ์ˆ˜ํ–‰ํ•ด์ค๋‹ˆ๋‹ค.
  • ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ pooling layer์˜ ๊ฒฝ์šฐ๋Š” channel์ˆ˜๊ฐ€ ๋ฐ”๋€Œ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์ธต์ด ๊นŠ์–ด์งˆ์ˆ˜๋ก ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ channel์ˆ˜๊ฐ€ ๋งŽ์•„์ง€๋Š” ๊ตฌ์กฐ๊ฐ€ ๋˜๊ฒŒ๋ฉ๋‹ˆ๋‹ค.

(b) Reduced Dimension Inception

1x1 Convolution

์ด๋ฅผ ์œ„ํ•ด (b) Inception Module wit dimension reduction์ด ์ œ์•ˆ๋˜๊ฒŒ ๋˜๋Š”๋ฐ์š”. ์ด๋Š” 1x1 Convolution์„ ์ด์šฉํ•˜์—ฌ dimension reduction์„ ํ•ด์คŒ์œผ๋กœ์จ computational load๋ฅผ ์ค„์—ฌ์ฃผ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. 1x1 Convolution์€ ์ง๊ด€์ ์œผ๋กœ 1x1 ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง€๋Š” Convolution Filter๋ฅผ ์‚ฌ์šฉํ•œ Convolution Layer์ž…๋‹ˆ๋‹ค.

1x1conv

์œ„ ๊ทธ๋ฆผ์—์„œ ๋ณด์‹ค ์ˆ˜ ์žˆ๋“ฏ์ด input์— 1x1 Convolution์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋˜๋ฉด feature map์˜ ํฌ๊ธฐ๊ฐ€ ๋™์ผํ•˜๊ฒŒ ์œ ์ง€๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์œ„ ๊ทธ๋ฆผ์˜ ๊ฒฝ์šฐ (6x6)์ด (6x6)์œผ๋กœ ๊ทธ๋Œ€๋กœ ๋ณด์กด๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋•Œ ์ด๋Ÿฌํ•œ 1x1 convolution ํ•„ํ„ฐ๋ฅผ ๋ช‡๊ฐœ ์‚ฌ์šฉํ•˜๋Š”๊ฐ€์— ๋”ฐ๋ผ ouput์˜ ์ถœ๋ ฅ ์ฑ„๋„์˜ ๊ฐœ์ˆ˜๋ฅผ ์กฐ์ ˆํ•ด์ค„ ์ˆ˜ ์žˆ๊ฒŒ๋ฉ๋‹ˆ๋‹ค.

GoogleNet์€ ์ด๋Ÿฌํ•œ 1x1 Convolution์˜ ํŠน์ง•์„ ์‚ด๋ ค sparseํ•œ ์—ฐ์‚ฐ์„ denseํ•˜๋„๋ก ๋ฐ”๊ฟ”์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ๊ทธ๋ฆผ์„ ์‚ดํŽด๋ณด์‹œ์ฃ .

Comparison

๊ธฐ์กด์˜ (a)์—์„œ 5x5 Convolution์ด ์ขŒ์ธก๊ณผ ๊ฐ™์ด 112.9M๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๋ฉด, (b)์—์„œ์ฒ˜๋Ÿผ 1x1 Convolution์„ ํ†ตํ•ด ์ฑ„๋„ ์ˆ˜๋ฅผ ์ค„์—ฌ์ฃผ์–ด Bottleneck์—ญํ• ์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ์ด๋ฅผ ๋‹ค์‹œ 5x5 Convolution์„ ํ†ต๊ณผ์‹œ์ผœ ๊ฐ™์€ output์ธ 14x14x48์„ ๋ฐ˜ํ™˜ํ•ด์คŒ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  5.3M๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ–๊ฒŒํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Global Average Pooling(GAP)

๋˜ํ•œ, CNN ๋ชจ๋ธ๋“ค์„ ์ƒ๊ฐํ•ด๋ณด๋ฉด ๋งˆ์ง€๋ง‰์— softmax ์—ฐ์‚ฐ์„ ์ทจํ•˜๊ธฐ ์ „์— ์•„๋ž˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ๋„์ถœ๋œ feature map์„ flatten์„ ํ†ตํ•ด ์ญ‰ ํŽด์„œ ๊ต‰์žฅํžˆ ๊ธด ํ•˜๋‚˜์˜ ๋ฒกํ„ฐ๋กœ ๋งŒ๋“  ๋‹ค์Œ, ๊ทธ ๋ฒกํ„ฐ๋ฅผ Fully Connected Layer์— ๋„ฃ๋Š” ๋ฐฉ์‹์œผ๋กœ ํ•˜๋‚˜ํ•˜๋‚˜ ๋งคํ•‘ํ•ด์„œ ํด๋ž˜์Šค๋ฅผ ๋ถ„๋ฅ˜ํ–ˆ์Šต๋‹ˆ๋‹ค.

fully connect layer

ํ•˜์ง€๋งŒ, ์ด๋Ÿฌํ•œ ๊ณผ์ •์€ ๊ธฐ์กด featuremap์˜ ๊ณต๊ฐ„์  ์ •๋ณด๋„ ๋งŽ์ด ์žƒ์–ด๋ฒ„๋ฆฌ๋Š”๋ฐ๋‹ค๊ฐ€, ๊ต‰์žฅํžˆ ๋งŽ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ํ•„์š”ํ•˜๊ฒŒ ๋˜๊ณ , VGGNet์˜ ๊ฒฝ์šฐ ์ด ๋ถ€๋ถ„์ด ์ „์ฒด ๊ณ„์‚ฐ๋Ÿ‰์˜ 85%๋ฅผ ์ฐจ์ง€ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ๋ฐ”๋กœ GAP(Global Average Pooling)๋กœ, ์•„๋ž˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

GAP

์œ„ ๊ทธ๋ฆผ์—์„  7x7 feature map์ด 1024๊ฐœ์ด๋ฏ€๋กœ ๋ถ„๋ฅ˜ํ•  ํด๋ž˜์Šค ์ˆ˜๊ฐ€ 1024๊ฐœ๋ผ๊ณ  ๊ฐ€์ •ํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. GAP๋Š” ๊ฐ feature map ์•ˆ์— ์žˆ๋Š” ํŠน์ง•๊ฐ’๋“ค์˜ ํ‰๊ท ์„ ๊ตฌํ•ด์„œ ๊ฐ๊ฐ์˜ ์ถœ๋ ฅ ๋…ธ๋“œ์— ๋ฐ”๋กœ ์ž…๋ ฅํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ํŒŒ๋ผ๋ฏธํ„ฐ ๊ด€์ ์—์„œ ํ•ด์„ํ•ด์„œ ๋ณด๋ฉด ์œ„์˜ Fully Connected ๊ทธ๋ฆผ์˜ ๊ฒฝ์šฐ, (7x7x1024)x1024 = 51.3M๊ฐœ์˜ Weight๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๊ณ , GAP์˜ ๊ฒฝ์šฐ ๋ฐ”๋กœ ํ‰๊ท ์„ ์ทจํ•˜๊ฒŒ ๋˜๋ฏ€๋กœ Weight์˜ ๊ฐœ์ˆ˜๊ฐ€ 0๊ฐœ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

Auxiliary Classifier

๋ฐฉ๊ธˆ๊นŒ์ง€ ์œ„ํ•ด์„œ 1x1 Conv๊ณผ GAP๋ฅผ ์‚ดํŽด๋ณด์…จ๋Š”๋ฐ์š”. GoogleNet์˜ ๋‹ค๋ฅธ ํŠน์ง•์œผ๋กœ๋Š” ๋ฐ”๋กœ Auxiliary Classifier๊ฐ€ ์กด์žฌํ•œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.

Auxiliary classifier

์ผ๋ฐ˜์ ์œผ๋กœ Neural Network์€ ๋ชจ๋ธ์ด ์ ์  ๊นŠ์–ด์งˆ์ˆ˜๋ก ๊ธฐ์šธ๊ธฐ๊ฐ€ ์†Œ์‹ค๋˜๋Š” vanishing gradient์˜ ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธฐ๊ฒŒ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ GoogleNet์—์„œ๋Š” ๊ธฐ์กด์— ์‚ฌ์šฉํ•˜๋˜ ๋ฐฉ๋ฒ•๊ณผ ๊ฐ™์ด Neural Network์˜ ๋งจ ๋งˆ์ง€๋ง‰ layer์— softmax๋ฅผ ๋”ฑ ํ•˜๋‚˜๋งŒ ๋†“์ง€ ์•Š๊ณ , Auxiliary classifier๋ผ๋Š” ์ถ”๊ฐ€์ ์ธ classifier๋ฅผ ์ •์˜ํ•˜์—ฌ ์ค‘๊ฐ„์—์„œ๋„ Backpropagation์ด ์ง„ํ–‰๋  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ด๋•Œ, Backpropagation์‹œ, weight๊ฐ’์— ํฐ ์˜ํ–ฅ์„ ์ฃผ๋Š” ๊ฒƒ์„ ๋ง‰๊ธฐ ์œ„ํ•ด Auxiliary Classifier์— 0.3์„ ๊ณฑํ•˜์—ฌ training์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. Auxiliary Classifier๋Š” ํ•™์Šต ์‹œ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ์ธ vanishing gradient๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ๋ฒ•์ด๋ฏ€๋กœ ๊ฒ€์ฆ ์ˆ˜ํ–‰ ์‹œ ์ด๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  ์ œ์ผ ๋งˆ์ง€๋ง‰ layer์˜ softmax๋งŒ์„ ์‚ฌ์šฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ฝ”๋“œ ์‹ค์Šต

์ธ์…‰์…˜ ๋ชจ๋“ˆ์„ ์ •์˜ํ•˜๊ธฐ ์œ„ํ•ด ์•„๋ž˜ ํ•จ์ˆ˜๋“ค์„ ๋ฏธ๋ฆฌ ์ •์˜ํ•ด๋‘์—ˆ์Šต๋‹ˆ๋‹ค.

์ฝ”๋“œ์‹ค์Šต

  • 1x1 Convolution
  • 1x1 Convolution -> 3x3 Convolution
  • 1x1 Convolution -> 5x5 Convolution
  • 3x3 MaxPooling -> 1x1 Convolution

Define Convolution Blocks

  • 1x1 Convolution
1
2
3
4
5
6
def conv_1(in_dim,out_dim):
    model = nn.Sequential(
        nn.Conv2d(in_dim,out_dim,1,1),
        nn.ReLU(),
    )
    return model
  • 1x1 Convolution -> 3x3 Convolution
1
2
3
4
5
6
7
8
def conv_1_3(in_dim,mid_dim,out_dim):
    model = nn.Sequential(
        nn.Conv2d(in_dim,mid_dim,1,1),
        nn.ReLU(),
        nn.Conv2d(mid_dim,out_dim,3,1,1),
        nn.ReLU()
    )
    return model
  • 1x1 Convolution -> 5x5 Convolution
1
2
3
4
5
6
7
8
def conv_1_5(in_dim,mid_dim,out_dim):
    model = nn.Sequential(
        nn.Conv2d(in_dim,mid_dim,1,1),
        nn.ReLU(),
        nn.Conv2d(mid_dim,out_dim,5,1,2),
        nn.ReLU()
    )
    return model
  • 3x3 MaxPooling -> 1x1 Convolution
1
2
3
4
5
6
7
def max_3_1(in_dim,out_dim):
    model = nn.Sequential(
        nn.MaxPool2d(kernel_size=3,stride=1,padding=1),
        nn.Conv2d(in_dim,out_dim,1,1),
        nn.ReLU(),
    )
    return model

Define Inception Module

์ฝ”๋“œ์‹ค์Šต

Inception ๋ชจ๋“ˆ์€ ์ด์ „ ๋‹จ๊ณ„์—์„œ ์ •์˜ํ•œ ํ•จ์ˆ˜๋“ค์„ ํ™œ์šฉํ•ด์„œ concatํ•ด์ฃผ๋Š” ๋‹จ๊ณ„์ž…๋‹ˆ๋‹ค.

  • 1x1 Convolution
  • 1x1 Convolution -> 3x3 Convolution
  • 1x1 Convolution -> 5x5 Convolution
  • 3x3 MaxPooling -> 1x1 Convolution
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class inception_module(nn.Module):
    def __init__(self,in_dim,out_dim_1,mid_dim_3,out_dim_3,mid_dim_5,out_dim_5,pool_dim):
        super(inception_module,self).__init__()
        # 1x1 Convolution
        self.conv_1 = conv_1(in_dim,out_dim_1)
        
        # 1x1 Convolution -> 3x3 Convolution
        self.conv_1_3 = conv_1_3(in_dim,mid_dim_3,out_dim_3)
        
        # 1x1 Convolution -> 5x5 Convolution
        self.conv_1_5 = conv_1_5(in_dim,mid_dim_5,out_dim_5)
        
        # 3x3 MaxPooling -> 1x1 Convolution
        self.max_3_1 = max_3_1(in_dim,pool_dim)

    def forward(self,x):
        out_1 = self.conv_1(x)
        out_2 = self.conv_1_3(x)
        out_3 = self.conv_1_5(x)
        out_4 = self.max_3_1(x)
        # concat
        output = torch.cat([out_1,out_2,out_3,out_4],1)
        return output

Define GoogleNet

GoogleNet์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌ์„ฑ์ด ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

GoogleNet๊ตฌ์กฐ๋„

GoogleNet

์œ„์˜ ํ‘œ์—์„œ ๋‚˜์˜จ ๊ฐ’๋“ค์„ ์ž…๋ ฅํ•˜์—ฌ ๊ฐ๊ฐ์˜ dimension์„ ์ •์˜ํ•ด์ฃผ๋ฉด ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
class GoogLeNet(nn.Module):
    def __init__(self, base_dim, num_classes=2):
        super(GoogLeNet, self).__init__()
        self.num_classes=num_classes
        self.layer_1 = nn.Sequential(
            nn.Conv2d(3,base_dim,7,2,3),
            nn.MaxPool2d(3,2,1),
            nn.Conv2d(base_dim,base_dim*3,3,1,1),
            nn.MaxPool2d(3,2,1),
        )
        self.layer_2 = nn.Sequential(
            inception_module(base_dim*3,64,96,128,16,32,32),
            inception_module(base_dim*4,128,128,192,32,96,64),
            nn.MaxPool2d(3,2,1),
        )
        self.layer_3 = nn.Sequential(
            inception_module(480,192,96,208,16,48,64),
            inception_module(512,160,112,224,24,64,64),
            inception_module(512,128,128,256,24,64,64),
            inception_module(512,112,144,288,32,64,64),
            inception_module(528,256,160,320,32,128,128),
            nn.MaxPool2d(3,2,1),
        )
        self.layer_4 = nn.Sequential(
            inception_module(832,256,160,320,32,128,128),
            inception_module(832,384,192,384,48,128,128), 
            nn.AvgPool2d(7,1),
        )
        self.layer_5 = nn.Dropout2d(0.4)
        self.fc_layer = nn.Linear(1024,self.num_classes)
                
        
    def forward(self, x):
        out = self.layer_1(x)
        out = self.layer_2(out)
        out = self.layer_3(out)
        out = self.layer_4(out)
        out = self.layer_5(out)
        out = out.view(batch_size,-1)
        out = self.fc_layer(out)
        return out

CIFAR10 Implementation

์ด์ „ VGGNet๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ด๋ฅผ CIFAR10๋ฐ์ดํ„ฐ์— ์ ์šฉํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. TRAIN๊ณผ INFERENCEํ•จ์ˆ˜๋Š” ์ด์ „ ํฌ์ŠคํŠธ์™€ ๋™์ผํ•ฉ๋‹ˆ๋‹ค. (์ด์ „ ํฌ์ŠคํŠธ)

1
2
3
batch_size = 100
learning_rate = 0.0002
num_epoch = 100

TRAIN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
loss_arr = []
for i in trange(num_epoch):
    for j,[image,label] in enumerate(train_loader):
        x = image.to(device)
        y_= label.to(device)
        
        optimizer.zero_grad()
        output = model.forward(x)
        loss = loss_func(output,y_)
        loss.backward()
        optimizer.step()

    if i % 10 ==0:
        print(loss)
        loss_arr.append(loss.cpu().detach().numpy())

loss ์‹œ๊ฐํ™”

1
2
plt.plot(loss_arr)
plt.show()

loss

TEST

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# ๋งž์€ ๊ฐœ์ˆ˜, ์ „์ฒด ๊ฐœ์ˆ˜๋ฅผ ์ €์žฅํ•  ๋ณ€์ˆ˜๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
correct = 0
total = 0

model.eval()

# ์ธํผ๋Ÿฐ์Šค ๋ชจ๋“œ๋ฅผ ์œ„ํ•ด no_grad ํ•ด์ค๋‹ˆ๋‹ค.
with torch.no_grad():
    # ํ…Œ์ŠคํŠธ๋กœ๋”์—์„œ ์ด๋ฏธ์ง€์™€ ์ •๋‹ต์„ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.
    for image,label in test_loader:
        
        # ๋‘ ๋ฐ์ดํ„ฐ ๋ชจ๋‘ ์žฅ์น˜์— ์˜ฌ๋ฆฝ๋‹ˆ๋‹ค.
        x = image.to(device)
        y= label.to(device)

        # ๋ชจ๋ธ์— ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ๊ณ  ๊ฒฐ๊ณผ๊ฐ’์„ ์–ป์Šต๋‹ˆ๋‹ค.
        output = model.forward(x)
        _,output_index = torch.max(output,1)

        
        # ์ „์ฒด ๊ฐœ์ˆ˜ += ๋ผ๋ฒจ์˜ ๊ฐœ์ˆ˜
        total += label.size(0)
        correct += (output_index == y).sum().float()
    
    # ์ •ํ™•๋„ ๋„์ถœ
    print("Accuracy of Test Data: {}%".format(100*correct/total))

Accuracy of Test Data: 71.98999786376953%

๊ธด ๊ธ€ ์ฝ์–ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค ^~^



-->