[ํŒŒ์ดํ† ์น˜] ํŒŒ์ดํ† ์น˜๋กœ CNN ๋ชจ๋ธ์„ ๊ตฌํ˜„ํ•ด๋ณด์ž! (ResNetํŽธ)

Posted by Euisuk's Dev Log on December 19, 2021

[ํŒŒ์ดํ† ์น˜] ํŒŒ์ดํ† ์น˜๋กœ CNN ๋ชจ๋ธ์„ ๊ตฌํ˜„ํ•ด๋ณด์ž! (ResNetํŽธ)

์›๋ณธ ๊ฒŒ์‹œ๊ธ€: https://velog.io/@euisuk-chung/ํŒŒ์ดํ† ์น˜-ํŒŒ์ดํ† ์น˜๋กœ-CNN-๋ชจ๋ธ์„-๊ตฌํ˜„ํ•ด๋ณด์ž-ResNetํŽธ

์•ˆ๋…•ํ•˜์„ธ์š”! ์ง€๋‚œ๋ฒˆ ํฌ์ŠคํŠธ์ธ VGGNet๊ณผ GoogleNet ์ดํ›„๋กœ ์˜ค๋Š˜์€ ResNet ๊ด€๋ จ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค.

2๋ฒˆ์— ๊ฑธ์นœ ํฌ์ŠคํŒ…์—์„œ ์†Œ๊ฐœ๋“œ๋ ธ๋‹ค์‹œํ”ผ ์ปดํ“จํ„ฐ ๋น„์ „ ๋Œ€ํšŒ ์ค‘์— ILSVRC (Imagenet Large Scale Visual Recognition Challenges)์ด๋ผ๋Š” ๋Œ€ํšŒ๊ฐ€ ์žˆ๋Š”๋ฐ, ๋ณธ ๋Œ€ํšŒ๋Š” ๊ฑฐ๋Œ€ ์ด๋ฏธ์ง€๋ฅผ 1000๊ฐœ์˜ ์„œ๋ธŒ์ด๋ฏธ์ง€๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ ๊ทธ๋ฆผ์€ CNN๊ตฌ์กฐ์˜ ๋Œ€์ค‘ํ™”๋ฅผ ์ด๋Œ์—ˆ๋˜ ์ดˆ์ฐฝ๊ธฐ ๋ชจ๋ธ๋“ค๋กœ AlexNet (2012) - VGGNet (2014) - GoogleNet (2014) - ResNet (2015) ์ˆœ์œผ๋กœ ๊ณ„๋ณด๋ฅผ ์ด์–ด๋‚˜๊ฐ”์Šต๋‹ˆ๋‹ค.

ILSVRC

Source : https://icml.cc/2016/tutorials/

์œ„์˜ ๊ทธ๋ฆผ์—์„œ layers๋Š” CNN layer์˜ ๊ฐœ์ˆ˜(๊นŠ์ด)๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ ์ง๊ด€์ ์ธ ์ดํ•ด๋ฅผ ์œ„ํ•ด์„œ ์•„๋ž˜์ฒ˜๋Ÿผ ๊ทธ๋ฆผ์„ ๊ทธ๋ ค๋ณด์•˜์Šต๋‹ˆ๋‹ค.

Depth Comp

ResNet ๊ฐœ์š”

์†Œ๊ฐœ

ResNet์ด ์†Œ๊ฐœ๋œ ๋…ผ๋ฌธ์˜ ์ œ๋ชฉ์€ Going Deeper with Convolutions๋กœ, ๋‹ค์Œ ๋งํฌ์—์„œ ํ™•์ธํ•ด๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (๋งํฌ)

ResNet์˜ ์ €์ž๋“ค์€ ์ผ์ • ์ˆ˜์ค€ ์ด์ƒ์˜ ๊นŠ์ด๊ฐ€ ๋˜๋ฉด ์˜คํžˆ๋ ค ์–•์€ ๋ชจ๋ธ๋ณด๋‹ค ๊นŠ์€ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ๋” ๋–จ์–ด์ง„๋‹ค๋Š” ๊ฒƒ์„ ์•„๋ž˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

26_56_plot

Plane network 20-layer์™€ 56-layer์˜ train error์™€ test error (๋…ผ๋ฌธ ๋ฐœ์ทŒ)

์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ž”์ฐจ ํ•™์Šต(residual learning)์ด๋ผ๋Š” ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚จ ๊ฒƒ์ด ๋ฐ”๋กœ ResNet์ž…๋‹ˆ๋‹ค. ์•„์ด๋””์–ด๋Š” ์ •๋ง ์‹ฌํ”Œํ•œ๋ฐ์š”. ํŠน์ • ์œ„์น˜์—์„œ ์ž…๋ ฅ์ด ๋“ค์–ด์™”์„ ๋•Œ ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ์„ ํ†ต๊ณผํ•œ ๊ฒฐ๊ณผ์™€ ์ž…๋ ฅ์œผ๋กœ ๋“ค์–ด์˜จ ๊ฒฐ๊ณผ ๋‘๊ฐ€์ง€๋ฅผ ๋”ํ•ด์„œ ๋‹ค์Œ ๋ ˆ์ด์–ด์— ์ „๋‹ฌํ•˜๋Š” ๊ฒƒ์ด ResNet์˜ ํ•ต์‹ฌ์ž…๋‹ˆ๋‹ค. (์•„๋ž˜ ๊ทธ๋ฆผ ์ฐธ๊ณ )

residual block

Residual Learning (๋…ผ๋ฌธ ๋ฐœ์ทŒ)

์œ„ ๊ทธ๋ฆผ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋‹ค์‹œํ”ผ ์ž”์ฐจ ํ•™์Šต์€ ์ด์ „ ๋‹จ๊ณ„์—์„œ ๋ฝ‘์•˜๋˜ ํŠน์„ฑ๋“ค์„ ๋ณ€ํ˜•์‹œํ‚ค์ง€ ์•Š๊ณ , ๊ทธ๋Œ€๋กœ ๋‹ค์Œ ๋‹จ๊ณ„๋กœ ์ „๋‹ฌํ•˜์—ฌ ๋”ํ•ด์ฃผ๊ธฐ ๋•Œ๋ฌธ์— ์•ž์—์„œ ํ•™์Šตํ•œ low-level ํŠน์ง•๊ณผ ๋’ค์—์„œ ํ•™์Šตํ•œ high-level ํŠน์ง•์„ ๋ชจ๋‘ ๋‹ค์Œ block(๋‹จ๊ณ„)๋กœ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์ „ GoogleNet์˜ ๊ฒฝ์šฐ, Neural Network์˜ Vanishing Gradient ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Auxilary Classifier๋ฅผ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ResNet์˜ ๊ฒฝ์šฐ ๋”ํ•˜๊ธฐ ์—ฐ์‚ฐ์€ ๊ธฐ์šธ๊ธฐ๊ฐ€ 1์ด๊ธฐ ๋•Œ๋ฌธ์— ์—ญ์ „ํŒŒ ์‹œ loss๊ฐ€ ์ค„์ง€ ์•Š๊ณ , ๋ชจ๋ธ ์•ž๊นŒ์ง€ ์ž˜ ์ „ํŒŒ๊ฐ€ ๋œ๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์–ด์„œ GoogleNet๊ณผ๋Š” ๋‹ค๋ฅด๊ฒŒ Auxilary Classifier๊ฐ€ ๋ณ„๋„๋กœ ํ•„์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ ๊ตฌ์กฐ

Overall Network

๋…ผ๋ฌธ์—์„œ๋Š” VGG-19, 34-layer Plain (without residual) ๋ชจ๋ธ๊ณผ 34-layer Residual ๋ชจ๋ธ์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‹œ๊ฐํ™”ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ ๊ตฌ์กฐ

VGG-19, 34-layer Plain & Residual (๋…ผ๋ฌธ ๋ฐœ์ทŒ)

์œ„ ๊ทธ๋ฆผ์—์„œ ์‹ค์„ ์€ featuremap์˜ dimension์ด ๋ฐ”๋€Œ์ง€ ์•Š์•„ ๊ทธ๋ƒฅ ๋”ํ•ด์ฃผ๋Š” ๊ฒฝ์šฐ์ด๊ณ , ์ ์„ ์€ ์ž…๋ ฅ๋‹จ๊ณผ ์ถœ๋ ฅ๋‹จ์˜ dimension์˜ ์ฐจ์ด๋กœ ์ธํ•ด ์ด๋ฅผ ๋งž์ถฐ์ค„ ์ˆ˜ ์žˆ๋Š” ํ…Œํฌ๋‹‰์ด ์ถ”๊ฐ€์ ์œผ๋กœ ๋”ํ•ด์ง„ shortcut connection์ž…๋‹ˆ๋‹ค.

์•„๋ž˜ํ‘œ๋Š” ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ๋‹ค์–‘ํ•œ ์œ ํ˜•์˜ ResNet๊ตฌ์กฐ๋“ค์ž…๋‹ˆ๋‹ค. ์œ„ ๊ทธ๋ฆผ์˜ ์˜ˆ์‹œ๋Š” ์•„๋ž˜ ๊ทธ๋ฆผ์—์„œ 34-layer ๋ชจ๋ธ๊ณผ ๋™์ผํ•ฉ๋‹ˆ๋‹ค.

ResNet

ResNet 19, 34, 50, 101, 152 layer

Plain Network

Plain Network์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ทœ์น™์— ๋”ฐ๋ผ ๋งŒ๋“ค์–ด์กŒ์Šต๋‹ˆ๋‹ค:

  • ๊ฐ™์€ ํฌ๊ธฐ์˜ output feature map์„ ๊ฐ–๊ณ  ์žˆ๋‹ค๋ฉด, ๊ฐ™์€ ์ˆ˜์˜ filters๋ฅผ ๊ฐ–๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
  • ๋งŒ์•ฝ feature map size๊ฐ€ ๋ฐ˜์œผ๋กœ ์ค„์–ด๋“ค์—ˆ๋‹ค๋ฉด, time-complexity๋ฅผ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด filters์˜ ์ˆ˜๋Š” ๋‘ ๋ฐฐ๊ฐ€ ๋˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
  • Downsampling์„ ํ•˜๊ธฐ ์œ„ํ•ด์„œ stride๊ฐ€ 2์ธ conv layers๋ฅผ ํ†ต๊ณผ์‹œ์ผœ์ค๋‹ˆ๋‹ค.
  • 1x1 convolution์˜ ๊ฒฝ์šฐ, ๋™์ผํ•œ ์‚ฌ์ด์ฆˆ์˜ feature map์„ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๋ณ„๋„์˜ padding์ด ํ•„์š”์—†์Šต๋‹ˆ๋‹ค.
  • ํ•˜์ง€๋งŒ, 3x3 convolution์˜ ๊ฒฝ์šฐ, ๋™์ผํ•œ ์‚ฌ์ด์ฆˆ์˜ feature map์„ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด size 1์˜ padding์ด ํ•„์š”ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  • Network์˜ ๋งˆ์ง€๋ง‰ ๋‹จ์—๋Š” Global Average Pooling(GAP)๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, ImageNet Classification์„ ๋ชฉ์ ์œผ๋กœ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— 1000-way-fully-connected layer๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.

Residual Network

Residual Network

Residual Network(ResNet)์˜ ๊ธฐ๋ณธ์ ์ธ ์กฐ๊ฑด์€ ์œ„์˜ plain network์™€ ๋™์ผํ•ฉ๋‹ˆ๋‹ค. ํ•œ๊ฐ€์ง€ ๋‹ค๋ฅธ ์ ์€ ๊ฐ๊ฐ์˜ block๋“ค์ด ๋๋‚ ๋•Œ๋งˆ๋‹ค shortcut connection ์ถ”๊ฐ€๋œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.

Identity/Projection

  • input๊ณผ output์˜ ์ฐจ์›์ด ๊ฐ™๋‹ค๋ฉด identity shortcut์€ ๋ฐ”๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (1)

identity shortcuts

  • ํ•˜์ง€๋งŒ, ์ฐจ์›์ด ๋‹ค๋ฅด๋‹ค๋ฉด identity shortcut์„ ๋ฐ”๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. Identity Shortcut ๋Œ€์‹  Projection Shortcut์ด ์‚ฌ์šฉ๋˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. (By using 1x1 Convolution) (2)

Projection Shortcut

Shortcuts Comparison

  • ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ๋Š” shortcut์˜ ์‚ฌ์šฉ๋ฐฉ๋ฒ•์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ์„ ์•„๋ž˜์™€ ๊ฐ™์ด ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค.

Table 3

  • (A) Increasing Dimension์— Zero Padding์„ ํ™œ์šฉํ•œ Shortcut์„ ์‚ฌ์šฉ
  • (B) Increasing Dimension์— Projection Shortcut์„ ์‚ฌ์šฉ
  • (C) ๋ชจ๋“  Shortcut์„ Projection Shortcut์œผ๋กœ ๋Œ€์ฒดํ•˜์—ฌ ์‚ฌ์šฉ

Table 3์„ ๋ณด๋ฉด 3๊ฐ€์ง€ ์˜ต์…˜ ๋ชจ๋‘ Plain Network๋ณด๋‹ค ์„ฑ๋Šฅ์ด ์ข‹์œผ๋ฉฐ, A < B < C์ˆœ์œผ๋กœ ์„ฑ๋Šฅ์ด ์ข‹์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” B๊ฐ€ A๋ณด๋‹ค ๋‚˜์€ ์ด์œ ๋ฅผ A์˜ zero-padding๊ณผ์ •์— residual learning์ด ์—†๊ธฐ ๋•Œ๋ฌธ์ด๋ผ๊ณ  ๋งํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ , C๊ฐ€ B๋ณด๋‹ค ์ข‹์€ ์ด์œ ๋กœ๋Š” extra parameters๊ฐ€ ๋” ๋งŽ๊ธฐ ๋•Œ๋ฌธ์— ์ด๋Š” ์„ฑ๋Šฅ ํ–ฅ์ƒ์œผ๋กœ ์ด์–ด์กŒ๋‹ค๊ณ  ์ด์•ผ๊ธฐ ํ•ฉ๋‹ˆ๋‹ค.

A, B, C์—์„œ์˜ ์ž‘์€ ์ฐจ์ด๋ฅผ ํ†ตํ•ด ์•Œ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์€ Projection Shortcut์€ ๋ณธ ๋…ผ๋ฌธ์—์„œ ๋ฌธ์ œ ์‚ผ๊ณ  ์žˆ๋Š” degradation ๋ฌธ์ œ๋ฅผ address ํ•˜๋Š” ๊ฒƒ์˜ ๋ณธ์งˆ์ด ์•„๋‹ˆ๋ผ๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋˜ํ•œ, extra parameter๊ฐ€ ์ถ”๊ฐ€๋˜๋Š” C๋Š” memory & time complexity ๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

Deeper Bottleneck Architecture

๋ณธ ๋…ผ๋ฌธ์—์„œ ์ €์ž๋“ค์€ Layer ๊ฐ€ ๊นŠ์–ด์ง€๋ฉด training time ์ด ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•˜์˜€๊ณ , ์ด๋ฅผ ๊ณ ๋ คํ•˜์—ฌ Residual Block์„ ์•„๋ž˜์™€ ๊ฐ™์ด 1x1 Convolution์„ ํ™œ์šฉํ•˜์—ฌ ๊ฐœ์„ ํ•œ Bottleneck Block์„ ์ œ์•ˆํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Bottleneck Block

Bottleneck Block์€ 1x1, 3x3, 1x1 convolution์œผ๋กœ ๊ตฌ์„ฑ๋œ 3๊ฐœ์˜ Layer๋ฅผ ์Œ“์€ ๊ตฌ์กฐ๋กœ, Basic Block ๋ณด๋‹ค Layer ์ˆ˜๊ฐ€ 1๊ฐœ ๋” ๋งŽ์ง€๋งŒ, time complexity๋Š” ๋น„์Šทํ•˜๋‹ค๋Š” ํŠน์ง•์„ ๊ฐ–๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋•Œ Bottleneck Block์—๋Š” ์•ž์—์„œ ์†Œ๊ฐœํ•œ ์˜ต์…˜ B๋ฅผ ์ ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฐ ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ ๊นŠ์€ ๋ชจ๋ธ(50-layer, 101-layer, 152-layer์— ์ ์šฉํ•ด๋ณธ ๊ฒฐ๊ณผ, ๊ธฐ์กด์˜ degradation์˜ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๊ณ , ๊นŠ์ด๊ฐ€ ๋” ๊นŠ์–ด์ง์— ๋”ฐ๋ผ ๋” ์ข‹์•„์ง€๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

์‹คํ—˜

CIFAR 10

๋จผ์ € CIFAR10 ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•˜์—ฌ ์‹คํ—˜ํ•œ ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์ ์„ ์€ Training Error๋ฅผ ์˜๋ฏธํ•˜๊ณ , ์‹ค์„ ์€ Test Error๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

CIFAR10

  • Figure 6์˜ ์ขŒ์ธก์— ์žˆ๋Š” ๊ทธ๋ž˜ํ”„๋Š” residual ์—ฐ์‚ฐ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ plain network๋ฅผ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ์˜ Error์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์‚ดํŽด๋ณด๋ฉด layer๊ฐ€ ๊นŠ์„ ์ˆ˜๋ก Error๊ฐ€ ๋†’์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (Degradation ๋ฌธ์ œ)
  • Figure 6์˜ ์ค‘์•™์— ์žˆ๋Š” ๊ทธ๋ž˜ํ”„๋Š” residual ์—ฐ์‚ฐ์„ ์‚ฌ์šฉํ•œ residual network๋ฅผ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ์˜ Error์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์‚ดํŽด๋ณด๋ฉด layer๊ฐ€ ๊นŠ์„ ์ˆ˜๋ก Error๊ฐ€ ๋‚ฎ์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Figure 6์˜ ์šฐ์ธก์— ์žˆ๋Š” ๊ทธ๋ž˜ํ”„๋Š” 1202-layer residual network์™€ 110-layer residual network๋กœ, ์œ ์‚ฌํ•œ training error ๋ณด์˜€์ง€๋งŒ test ์„ฑ๋Šฅ์€ ๋” ์ข‹์ง€ ์•Š์€ ๊ฒƒ์œผ๋กœ ๋ณด์•„ Overfitting์ด ๋ฐœ์ƒํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

PASCAL VOC & MS COCO

๊ฐ๊ฐ PASCAL VOC 2007/2012 ๋ฐ์ดํ„ฐ์™€ MS COCO ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ Object Detection์— ์žˆ์–ด์„œ๋„ VGGNet์„ ์‚ฌ์šฉํ•œ ๊ฒƒ๋ณด๋‹ค ResNet์„ ์‚ฌ์šฉํ•œ ๊ฒƒ์ด ๋” ์ข‹์€ ์„ฑ๋Šฅ์ด ๋‚˜์˜ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

PASCAL VOC 2007/2012

PASCAL VOC

MS COCO

MS COCO

์ฝ”๋“œ

์ด๋ฒˆ ํฌ์ŠคํŠธ์—์„œ๋Š” ResNet-50์„ ๊ตฌํ˜„ํ•ด๋ณด๋Š” ์‹œ๊ฐ„์„ ๊ฐ–๊ฒ ์Šต๋‹ˆ๋‹ค.

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.init as init

import torchvision
import torchvision.datasets as datasets
import torchvision.transforms as transforms

from torch.utils.data import DataLoader

import numpy as np
import matplotlib.pyplot as plt

import tqdm
from tqdm.auto import trange

ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ

1
2
3
batch_size = 50
learning_rate = 0.0002
num_epoch = 100

Load CIFAR-10

1
2
3
4
5
6
7
8
9
10
11
12
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# define dataset
cifar10_train = datasets.CIFAR10(root="../Data/", train=True, transform=transform, target_transform=None, download=True)
cifar10_test = datasets.CIFAR10(root="../Data/", train=False, transform=transform, target_transform=None, download=True)

# define loader
train_loader = DataLoader(cifar10_train,batch_size=batch_size, shuffle=True, num_workers=2, drop_last=True)
test_loader = DataLoader(cifar10_test,batch_size=batch_size, shuffle=False, num_workers=2, drop_last=True)

# define classes
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Basic Module

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def conv_block_1(in_dim,out_dim, activation,stride=1):
    model = nn.Sequential(
        nn.Conv2d(in_dim,out_dim, kernel_size=1, stride=stride),
        nn.BatchNorm2d(out_dim),
        activation,
    )
    return model


def conv_block_3(in_dim,out_dim, activation, stride=1):
    model = nn.Sequential(
        nn.Conv2d(in_dim,out_dim, kernel_size=3, stride=stride, padding=1),
        nn.BatchNorm2d(out_dim),
        activation,
    )
    return model

Bottleneck Module

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
class BottleNeck(nn.Module):
    def __init__(self,in_dim,mid_dim,out_dim,activation,down=False):
        super(BottleNeck,self).__init__()
        self.down=down
        
        # ํŠน์„ฑ์ง€๋„์˜ ํฌ๊ธฐ๊ฐ€ ๊ฐ์†Œํ•˜๋Š” ๊ฒฝ์šฐ
        if self.down:
            self.layer = nn.Sequential(
              conv_block_1(in_dim,mid_dim,activation,stride=2),
              conv_block_3(mid_dim,mid_dim,activation,stride=1),
              conv_block_1(mid_dim,out_dim,activation,stride=1),
            )
            
            # ํŠน์„ฑ์ง€๋„ ํฌ๊ธฐ + ์ฑ„๋„์„ ๋งž์ถฐ์ฃผ๋Š” ๋ถ€๋ถ„
            self.downsample = nn.Conv2d(in_dim,out_dim,kernel_size=1,stride=2)
            
        # ํŠน์„ฑ์ง€๋„์˜ ํฌ๊ธฐ๊ฐ€ ๊ทธ๋Œ€๋กœ์ธ ๊ฒฝ์šฐ
        else:
            self.layer = nn.Sequential(
                conv_block_1(in_dim,mid_dim,activation,stride=1),
                conv_block_3(mid_dim,mid_dim,activation,stride=1),
                conv_block_1(mid_dim,out_dim,activation,stride=1),
            )
            
        # ์ฑ„๋„์„ ๋งž์ถฐ์ฃผ๋Š” ๋ถ€๋ถ„
        self.dim_equalizer = nn.Conv2d(in_dim,out_dim,kernel_size=1)
                  
    def forward(self,x):
        if self.down:
            downsample = self.downsample(x)
            out = self.layer(x)
            out = out + downsample
        else:
            out = self.layer(x)
            if x.size() is not out.size():
                x = self.dim_equalizer(x)
            out = out + x
        return out

Define ResNet-50

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# 50-layer
class ResNet(nn.Module):

    def __init__(self, base_dim, num_classes=10):
        super(ResNet, self).__init__()
        self.activation = nn.ReLU()
        self.layer_1 = nn.Sequential(
            nn.Conv2d(3,base_dim,7,2,3),
            nn.ReLU(),
            nn.MaxPool2d(3,2,1),
        )
        self.layer_2 = nn.Sequential(
            BottleNeck(base_dim,base_dim,base_dim*4,self.activation),
            BottleNeck(base_dim*4,base_dim,base_dim*4,self.activation),
            BottleNeck(base_dim*4,base_dim,base_dim*4,self.activation,down=True),
        )   
        self.layer_3 = nn.Sequential(
            BottleNeck(base_dim*4,base_dim*2,base_dim*8,self.activation),
            BottleNeck(base_dim*8,base_dim*2,base_dim*8,self.activation),
            BottleNeck(base_dim*8,base_dim*2,base_dim*8,self.activation),
            BottleNeck(base_dim*8,base_dim*2,base_dim*8,self.activation,down=True),
        )
        self.layer_4 = nn.Sequential(
            BottleNeck(base_dim*8,base_dim*4,base_dim*16,self.activation),
            BottleNeck(base_dim*16,base_dim*4,base_dim*16,self.activation),
            BottleNeck(base_dim*16,base_dim*4,base_dim*16,self.activation),            
            BottleNeck(base_dim*16,base_dim*4,base_dim*16,self.activation),
            BottleNeck(base_dim*16,base_dim*4,base_dim*16,self.activation),
            BottleNeck(base_dim*16,base_dim*4,base_dim*16,self.activation,down=True),
        )
        self.layer_5 = nn.Sequential(
            BottleNeck(base_dim*16,base_dim*8,base_dim*32,self.activation),
            BottleNeck(base_dim*32,base_dim*8,base_dim*32,self.activation),
            BottleNeck(base_dim*32,base_dim*8,base_dim*32,self.activation),
        )
        self.avgpool = nn.AvgPool2d(1,1) 
        self.fc_layer = nn.Linear(base_dim*32,num_classes)
        
    def forward(self, x):
        out = self.layer_1(x)
        out = self.layer_2(out)
        out = self.layer_3(out)
        out = self.layer_4(out)
        out = self.layer_5(out)
        out = self.avgpool(out)
        out = out.view(batch_size,-1)
        out = self.fc_layer(out)
        
        return out

Train

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
device = torch.device("cuda:0")
model = ResNet(base_dim=64).to(device)
loss_func = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(),lr=learning_rate)

loss_arr = []
for i in trange(num_epoch):
    for j,[image,label] in enumerate(train_loader):
        x = image.to(device)
        y_= label.to(device)
        
        optimizer.zero_grad()
        output = model.forward(x)
        loss = loss_func(output,y_)
        loss.backward()
        optimizer.step()

    if i % 10 ==0:
        print(loss)
        loss_arr.append(loss.cpu().detach().numpy())

์„ฑ๋Šฅ (epoch = 100)

Train Loss

Train Loss

Test Accuracy

Accuracy of Test Data: 74.33999633789062%



-->