[ํŒŒ์ดํ† ์น˜] ํŒŒ์ดํ† ์น˜๋กœ CNN ๋ชจ๋ธ์„ ๊ตฌํ˜„ํ•ด๋ณด์ž! (VGGNetํŽธ)

Posted by Euisuk's Dev Log on November 27, 2021

[ํŒŒ์ดํ† ์น˜] ํŒŒ์ดํ† ์น˜๋กœ CNN ๋ชจ๋ธ์„ ๊ตฌํ˜„ํ•ด๋ณด์ž! (VGGNetํŽธ)

์›๋ณธ ๊ฒŒ์‹œ๊ธ€: https://velog.io/@euisuk-chung/ํŒŒ์ดํ† ์น˜-ํŒŒ์ดํ† ์น˜๋กœ-CNN-๋ชจ๋ธ์„-๊ตฌํ˜„ํ•ด๋ณด์ž-VGGNetํŽธ

์•ˆ๋…•ํ•˜์„ธ์š”! ์˜ค๋Š˜ ํฌ์ŠคํŒ…๋ถ€ํ„ฐ ๋‹ค์Œ๋‹ค์Œ ํฌ์ŠคํŒ…๊นŒ์ง€๋Š” CNN ๋ชจ๋ธ์˜ ๋ผˆ๋Œ€๊ฐ€ ๋˜๋Š” ๋ชจ๋ธ๋“ค์ธ VGGNet, GoogleNet, ResNet์„ ์†Œ๊ฐœํ•˜๊ณ  ์ด๋ฅผ ๊ตฌํ˜„ํ•ด๋ณด๋Š” ์‹œ๊ฐ„์„ ๊ฐ–๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค! :) ์ด๋ฒˆ ํฌ์ŠคํŒ…์€ VGGNet ๊ด€๋ จ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค.

๋จผ์ € ILSVRC (Imagenet Large Scale Visual Recognition Challenges)์ด๋ผ๋Š” ๋Œ€ํšŒ๊ฐ€ ์žˆ๋Š”๋ฐ, ๋ณธ ๋Œ€ํšŒ๋Š” ๊ฑฐ๋Œ€ ์ด๋ฏธ์ง€๋ฅผ 1000๊ฐœ์˜ ์„œ๋ธŒ์ด๋ฏธ์ง€๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ ๊ทธ๋ฆผ์€ CNN๊ตฌ์กฐ์˜ ๋Œ€์ค‘ํ™”๋ฅผ ์ด๋Œ์—ˆ๋˜ ์ดˆ์ฐฝ๊ธฐ ๋ชจ๋ธ๋“ค๋กœ AlexNet (2012) - VGGNet (2014) - GoogleNet (2014) - ResNet (2015) ์ˆœ์œผ๋กœ ๊ณ„๋ณด๋ฅผ ์ด์–ด๋‚˜๊ฐ”์Šต๋‹ˆ๋‹ค.

ILSVRC

Source : https://icml.cc/2016/tutorials/

์œ„์˜ ๊ทธ๋ฆผ์—์„œ layers๋Š” CNN layer์˜ ๊ฐœ์ˆ˜(๊นŠ์ด)๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ ์ง๊ด€์ ์ธ ์ดํ•ด๋ฅผ ์œ„ํ•ด์„œ ์•„๋ž˜์ฒ˜๋Ÿผ ๊ทธ๋ฆผ์„ ๊ทธ๋ ค๋ณด์•˜์Šต๋‹ˆ๋‹ค.

Depth Comp

VGGNet ๊ฐœ์š”

์†Œ๊ฐœ

VGGNet์ด ์†Œ๊ฐœ๋œ ๋…ผ๋ฌธ์˜ ์ œ๋ชฉ์€ Very deep convolutional networks for large-scale image recognition๋กœ, ๋‹ค์Œ ๋งํฌ์—์„œ ํ™•์ธํ•ด๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งํฌ

์ธ์šฉ

VGGNet์€ ์‹ ๊ฒฝ๋ง์˜ ๊นŠ์ด๊ฐ€ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ์กฐ์‚ฌํ•˜๊ธฐ ์œ„ํ•ด ํ•ด๋‹น ์—ฐ๊ตฌ๋ฅผ ์‹œ์ž‘ํ•˜์˜€์œผ๋ฉฐ, ์ด๋ฅผ ์ฆ๋ช…ํ•˜๊ธฐ ์œ„ํ•ด 3x3 convolution์„ ์ด์šฉํ•œ Deep CNNs๋ฅผ ์ œ์•ˆํ•˜์˜€์Šต๋‹ˆ๋‹ค. VGGNet์€ ILSVRC-2014 ๋Œ€ํšŒ์—์„œ GoogLeNet์— ์ด์–ด 2๋“ฑ์„ ์ฐจ์ง€ํ•˜์˜€์œผ๋‚˜, GoogLeNet์— ๋น„ํ•ด ํ›จ์”ฌ ๊ฐ„๋‹จํ•œ ๊ตฌ์กฐ๋กœ ์ธํ•ด 1๋“ฑ์ธ ๋ชจ๋ธ๋ณด๋‹ค ๋”์šฑ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜์—ˆ๋‹ค๋Š” ํŠน์ง•์„ ๊ฐ–๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์‹คํ—˜์„ค๊ณ„

๋ชจ๋ธ์€ 3x3 convolution, Max-pooling, Fully Connected Network 3๊ฐ€์ง€ ์—ฐ์‚ฐ์œผ๋กœ๋งŒ ๊ตฌ์„ฑ์ด ๋˜์–ด ์žˆ์œผ๋ฉฐ ์•„๋ž˜ ํ‘œ์™€ ๊ฐ™์ด A, A-LRN, B, C, D, E 5๊ฐ€์ง€ ๋ชจ๋ธ์— ๋Œ€ํ•ด ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

VGG

์ด๋•Œ ์‚ฌ์šฉํ•œ ๊ฐ๊ฐ์˜ window_size์™€ activation function์˜ ์„ค์ •์„ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • 3x3 convolution filters (stride: 1)
  • 2x2 Max pooling (stride : 2)
  • Activation function : ReLU

๐Ÿ“ข ์—ฌ๊ธฐ์„œ ์ž ๊น!

์œ„ ํ‘œ์—์„œ conv3-64๋ผ๊ณ  ์จ์žˆ๋Š” ๊ฒƒ์€ 3x3์˜ window_size๋ฅผ ๊ฐ–๊ณ  ์‚ฌ์šฉํ•œ window์˜ ๊ฐœ์ˆ˜๊ฐ€ 64๊ฐœ์ž„์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

์„ฑ๋Šฅ

์•„๋ž˜ ์„ฑ๋Šฅํ‘œ๋ฅผ ํ†ตํ•ด ์šฐ๋ฆฌ๋Š” ๊นŠ์ด๊ฐ€ ๊นŠ์–ด์งˆ์ˆ˜๋ก ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ์ข‹์•„์ง€๋Š” ๊ฒƒ๊ณผ Local Response Normalization(LRN)์€ ์„ฑ๋Šฅ์— ํฐ ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š๋Š”๋‹ค๋Š” ์‚ฌ์‹ค์„ ๋ฐœ๊ฒฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Eval

VGGNet ๊ตฌํ˜„

๊ทธ๋Ÿผ VGGNet์˜ ๊ฐœ์š”๋ฅผ ์‚ดํŽด๋ดค์œผ๋‹ˆ ์ด๋ฒˆ์—๋Š” ์ด๋ฅผ ๊ตฌํ˜„ํ•ด๋ณผ๊นŒ์š”? ๊ตฌํ˜„์€ ์œ„ ์‹คํ—˜ ์„ค๊ณ„ ํ‘œ์˜ D์—ด์˜ ์…‹ํŒ…์„ ๊ตฌํ˜„ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๋‹ค์‹œ ํ•œ๋ฒˆ ์ค„๊ธ€๋กœ ํ•ด๋‹น ๊ตฌ์กฐ๋ฅผ ์„ค๋ช…ํ•˜์ž๋ฉด ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • 3x3 ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ x2 (์ฑ„๋„ 64)
  • 3x3 ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ x2 (์ฑ„๋„ 128)
  • 3x3 ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ x3 (์ฑ„๋„ 256)
  • 3x3 ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ x3 (์ฑ„๋„ 512)
  • 3x3 ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ x3 (์ฑ„๋„ 512)
  • FC layer x3

    • FC layer 4096
    • FC layer 4096
    • FC layer 1000

VGG16

์ฝ”๋”ฉ์˜ ํŽธ์˜๋ฅผ ์œ„ํ•ด ๊ฐ๊ฐ conv layer๊ฐ€ 2๊ฐœ ์žˆ๋Š” block๊ณผ 3๊ฐœ ์žˆ๋Š” block์„ ๋”ฐ๋กœ ์„ ์–ธํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

conv_2_block

1
2
3
4
5
6
7
8
9
def conv_2_block(in_dim,out_dim):
    model = nn.Sequential(
        nn.Conv2d(in_dim,out_dim,kernel_size=3,padding=1),
        nn.ReLU(),
        nn.Conv2d(out_dim,out_dim,kernel_size=3,padding=1),
        nn.ReLU(),
        nn.MaxPool2d(2,2)
    )
    return model

conv_3_block

1
2
3
4
5
6
7
8
9
10
11
def conv_3_block(in_dim,out_dim):
    model = nn.Sequential(
        nn.Conv2d(in_dim,out_dim,kernel_size=3,padding=1),
        nn.ReLU(),
        nn.Conv2d(out_dim,out_dim,kernel_size=3,padding=1),
        nn.ReLU(),
        nn.Conv2d(out_dim,out_dim,kernel_size=3,padding=1),
        nn.ReLU(),
        nn.MaxPool2d(2,2)
    )
    return model

Define VGG16

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class VGG(nn.Module):
    def __init__(self, base_dim, num_classes=10):
        super(VGG, self).__init__()
        self.feature = nn.Sequential(
            conv_2_block(3,base_dim), #64
            conv_2_block(base_dim,2*base_dim), #128
            conv_3_block(2*base_dim,4*base_dim), #256
            conv_3_block(4*base_dim,8*base_dim), #512
            conv_3_block(8*base_dim,8*base_dim), #512        
        )
        self.fc_layer = nn.Sequential(
            # CIFAR10์€ ํฌ๊ธฐ๊ฐ€ 32x32์ด๋ฏ€๋กœ 
            nn.Linear(8*base_dim*1*1, 4096),
            # IMAGENET์ด๋ฉด 224x224์ด๋ฏ€๋กœ
            # nn.Linear(8*base_dim*7*7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 1000),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(1000, num_classes),
        )

    def forward(self, x):
        x = self.feature(x)
        #print(x.shape)
        x = x.view(x.size(0), -1)
        #print(x.shape)
        x = self.fc_layer(x)
        return x

model, loss, optimizer ์„ ์–ธ

1
2
3
4
5
6
7
8
9
# device ์„ค์ •
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# VGG ํด๋ž˜์Šค๋ฅผ ์ธ์Šคํ„ด์Šคํ™”
model = VGG(base_dim=64).to(device)

# ์†์‹คํ•จ์ˆ˜ ๋ฐ ์ตœ์ ํ™”ํ•จ์ˆ˜ ์„ค์ •
loss_func = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

load CIFAR10 dataset

  • CIFAR10์€ โ€˜๋น„ํ–‰๊ธฐ(airplane)โ€™, โ€˜์ž๋™์ฐจ(automobile)โ€™, โ€˜์ƒˆ(bird)โ€™, โ€˜๊ณ ์–‘์ด(cat)โ€™, โ€˜์‚ฌ์Šด(deer)โ€™, โ€˜๊ฐœ(dog)โ€™, โ€˜๊ฐœ๊ตฌ๋ฆฌ(frog)โ€™, โ€˜๋ง(horse)โ€™, โ€˜๋ฐฐ(ship)โ€™, โ€˜ํŠธ๋Ÿญ(truck)โ€™๋กœ 10๊ฐœ์˜ ํด๋ž˜์Šค๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹์ž…๋‹ˆ๋‹ค.
  • CIFAR10์— ํฌํ•จ๋œ ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ๋Š” 3x32x32๋กœ, ์ด๋Š” 32x32 ํ”ฝ์…€ ํฌ๊ธฐ์˜ ์ด๋ฏธ์ง€๊ฐ€ 3๊ฐœ ์ฑ„๋„(channel)์˜ ์ƒ‰์ƒ๋กœ ์ด๋ค„์ ธ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋œปํ•ฉ๋‹ˆ๋‹ค.

TRAIN/TEST ๋ฐ์ดํ„ฐ์…‹ ์ •์˜

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import torchvision
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Transform ์ •์˜
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# CIFAR10 TRAIN ๋ฐ์ดํ„ฐ ์ •์˜
cifar10_train = datasets.CIFAR10(root="../Data/", train=True, transform=transform, target_transform=None, download=True)

# CIFAR10 TEST ๋ฐ์ดํ„ฐ ์ •์˜
cifar10_test = datasets.CIFAR10(root="../Data/", train=False, transform=transform, target_transform=None, download=True)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

TRAIN ๋ฐ์ดํ„ฐ์…‹ ์‹œ๊ฐํ™”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import matplotlib.pyplot as plt
import numpy as np

# ์ด๋ฏธ์ง€๋ฅผ ๋ณด์—ฌ์ฃผ๊ธฐ ์œ„ํ•œ ํ•จ์ˆ˜

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# ํ•™์Šต์šฉ ์ด๋ฏธ์ง€๋ฅผ ๋ฌด์ž‘์œ„๋กœ ๊ฐ€์ ธ์˜ค๊ธฐ
dataiter = iter(train_loader)
images, labels = dataiter.next()

# ์ด๋ฏธ์ง€ ๋ณด์—ฌ์ฃผ๊ธฐ
imshow(torchvision.utils.make_grid(images))

# ์ •๋‹ต(label) ์ถœ๋ ฅ
print(' '.join('%5s' % classes[labels[j]] for j in range(batch_size)))

Source : https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html

์‹œ๊ฐํ™”

TRAIN & TEST

์ด์ œ ๋ฐ์ดํ„ฐ์…‹๋„ ์ •์˜ํ•ด์คฌ์œผ๋‹ˆ ๋ณธ๊ฒฉ์ ์œผ๋กœ ํ•™์Šต ๋ฐ ๊ฒ€์ฆ์„ ์ˆ˜ํ–‰ํ•ด ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ํ•™์Šต ์„ค์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ํ•ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

1
2
3
batch_size = 100
learning_rate = 0.0002
num_epoch = 100

TRAIN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
loss_arr = []
for i in trange(num_epoch):
    for j,[image,label] in enumerate(train_loader):
        x = image.to(device)
        y_= label.to(device)
        
        optimizer.zero_grad()
        output = model.forward(x)
        loss = loss_func(output,y_)
        loss.backward()
        optimizer.step()

    if i % 10 ==0:
        print(loss)
        loss_arr.append(loss.cpu().detach().numpy())

loss ์‹œ๊ฐํ™”

1
2
plt.plot(loss_arr)
plt.show()

loss

test ๊ฒฐ๊ณผ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# ๋งž์€ ๊ฐœ์ˆ˜, ์ „์ฒด ๊ฐœ์ˆ˜๋ฅผ ์ €์žฅํ•  ๋ณ€์ˆ˜๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
correct = 0
total = 0

model.eval()

# ์ธํผ๋Ÿฐ์Šค ๋ชจ๋“œ๋ฅผ ์œ„ํ•ด no_grad ํ•ด์ค๋‹ˆ๋‹ค.
with torch.no_grad():
    # ํ…Œ์ŠคํŠธ๋กœ๋”์—์„œ ์ด๋ฏธ์ง€์™€ ์ •๋‹ต์„ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.
    for image,label in test_loader:
        
        # ๋‘ ๋ฐ์ดํ„ฐ ๋ชจ๋‘ ์žฅ์น˜์— ์˜ฌ๋ฆฝ๋‹ˆ๋‹ค.
        x = image.to(device)
        y= label.to(device)

        # ๋ชจ๋ธ์— ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ๊ณ  ๊ฒฐ๊ณผ๊ฐ’์„ ์–ป์Šต๋‹ˆ๋‹ค.
        output = model.forward(x)
        _,output_index = torch.max(output,1)

        
        # ์ „์ฒด ๊ฐœ์ˆ˜ += ๋ผ๋ฒจ์˜ ๊ฐœ์ˆ˜
        total += label.size(0)
        correct += (output_index == y).sum().float()
    
    # ์ •ํ™•๋„ ๋„์ถœ
    print("Accuracy of Test Data: {}%".format(100*correct/total))

Accuracy of Test Data: 82.33999633789062%

๊ธด ๊ธ€ ์ฝ์–ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค ^~^



-->