[νŒŒμ΄ν† μΉ˜] νŒŒμ΄ν† μΉ˜λ‘œ CNN λͺ¨λΈμ„ κ΅¬ν˜„ν•΄λ³΄μž! (기초편 + DataLoader μ‚¬μš©λ²•)

Posted by Euisuk's Dev Log on November 26, 2021

[νŒŒμ΄ν† μΉ˜] νŒŒμ΄ν† μΉ˜λ‘œ CNN λͺ¨λΈμ„ κ΅¬ν˜„ν•΄λ³΄μž! (기초편 + DataLoader μ‚¬μš©λ²•)

MNIST 데이터 - CNN μ‹€μŠ΅

μ˜€λŠ˜μ€ MNIST λ°μ΄ν„°λ‘œ Convolutional Neural Network(μ΄ν•˜ CNN)을 κ΅¬ν˜„ν•˜κ³  λŒλ €λ³΄λŠ” μ‹œκ°„μ„ 갖도둝 ν•˜κ² μŠ΅λ‹ˆλ‹€!

λ¨Όμ €, CNN은 크게 μ•„λž˜μ™€ 같은 κ΅¬μ„±μš”μ†Œλ‘œ 이루어져 μžˆμŠ΅λ‹ˆλ‹€.

  • ν•©μ„±κ³± μ—°μ‚°(Convolution) : μ΄λ―Έμ§€μ˜ νŠΉμ„±μ„ μΆ”μΆœν•˜λŠ” 계측
  • λ§₯μŠ€ν’€λ§(Max Pooling) : μΆ”μΆœλœ νŠΉμ„± 쀑 μ€‘μš”ν•œ μ •λ³΄λ§Œμ„ μΆ•μ•½ν•˜μ—¬ 전달
  • μ™„μ „μ—°κ²° 신경망(Fully Connected Network) : μΆ”μΆœλœ 정보λ₯Ό 기반으둜 μ΅œμ’… μ˜ˆμΈ‘μ„ μˆ˜ν–‰ν•˜λŠ” 계측

CNN


Import Library

1
2
3
4
5
6
7
8
9
10
11
12
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.init as init

import torchvision.datasets as datasets
import torchvision.transforms as transforms

from torch.utils.data import DataLoader

import numpy as np
import matplotlib.pyplot as plt

Set Hyperparameter

1
2
3
batch_size = 100
learning_rate = 0.0002
num_epoch = 10

Load MNIST Data

1
2
mnist_train = datasets.MNIST(root="../Data/", train=True, transform=transforms.ToTensor(), download=True)
mnist_test = datasets.MNIST(root="../Data/", train=False, transform=transforms.ToTensor(), download=True)

Define Loaders

1
2
train_loader = DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=2, drop_last=True)
test_loader = DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=2, drop_last=True)

Define CNN(Base) Model

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        
        self.layer = nn.Sequential(
            nn.Conv2d(1, 16, 5),
            nn.ReLU(),
            nn.Conv2d(16, 32, 5),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(32, 64, 5),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)          
        )
        self.fc_layer = nn.Sequential(
            nn.Linear(64 * 3 * 3, 100),                                              
            nn.ReLU(),
            nn.Linear(100, 10)                                                   
        )       
        
    def forward(self, x):
        out = self.layer(x)
        out = out.view(batch_size, -1)
        out = self.fc_layer(out)
        return out

Define Device & Model

1
2
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = CNN().to(device)

Define Loss & Optimizer

1
2
loss_func = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

πŸ‹οΈ Train Model

이제 ν•™μŠ΅μ„ μ‹œμž‘ν•΄λ³΄κ² μŠ΅λ‹ˆλ‹€. λ¨Όμ € λͺ¨λΈμ„ ν•™μŠ΅ λͺ¨λ“œλ‘œ μ„€μ •ν•˜κΈ° μœ„ν•΄ model.train()을 ν˜ΈμΆœν•©λ‹ˆλ‹€.

πŸ”§ model.train()μ΄λž€?

model.train()은 PyTorch λͺ¨λΈμ„ ν•™μŠ΅ λͺ¨λ“œ(training mode)둜 μ „ν™˜ν•©λ‹ˆλ‹€.
μ΄λŠ” Dropout, BatchNorm 같은 ν•™μŠ΅ μ€‘μ—λ§Œ ν™œμ„±ν™”λ˜λŠ” λ ˆμ΄μ–΄λ₯Ό μ˜¬λ°”λ₯΄κ²Œ λ™μž‘μ‹œν‚€κΈ° μœ„ν•΄ ν•„μˆ˜μ μœΌλ‘œ ν˜ΈμΆœν•΄μ•Ό ν•©λ‹ˆλ‹€. 예λ₯Ό λ“€μ–΄:

  • Dropout은 ν•™μŠ΅ μ‹œ 일뢀 λ‰΄λŸ°μ„ λ¬΄μž‘μœ„λ‘œ κΊΌμ„œ 과적합을 λ°©μ§€ν•˜μ§€λ§Œ,
  • Batch Normalization은 배치의 톡계λ₯Ό μ‚¬μš©ν•˜μ—¬ κ°€μ€‘μΉ˜λ₯Ό μ •κ·œν™”ν•©λ‹ˆλ‹€.

model.train()을 ν˜ΈμΆœν•˜μ§€ μ•ŠμœΌλ©΄ μ΄λŸ¬ν•œ ν•™μŠ΅ νŠΉν™” κΈ°λŠ₯이 κΊΌμ§„ μ±„λ‘œ ν•™μŠ΅μ΄ μ§„ν–‰λ˜κΈ° λ•Œλ¬Έμ— λͺ¨λΈμ˜ μ„±λŠ₯이 ν˜„μ €νžˆ μ €ν•˜λ  수 μžˆμŠ΅λ‹ˆλ‹€.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
loss_arr = []

for i in range(num_epoch):
    model.train()  # ν•™μŠ΅ λͺ¨λ“œλ‘œ μ „ν™˜
    
    for j, [image, label] in enumerate(train_loader):
        x = image.to(device)
        y = label.to(device)

        optimizer.zero_grad()

        output = model(x)
        loss = loss_func(output, y)

        loss.backward()
        optimizer.step()

        if j % 1000 == 0:
            print(f"Epoch {i+1}, Step {j}: Loss = {loss.item():.4f}")
            loss_arr.append(loss.cpu().detach().numpy())

πŸ§ͺ Test Model

ν•™μŠ΅μ΄ μ™„λ£Œλœ λͺ¨λΈμ„ λ°”νƒ•μœΌλ‘œ ν…ŒμŠ€νŠΈ 데이터λ₯Ό μž…λ ₯ν•˜μ—¬ 정확도λ₯Ό ν‰κ°€ν•΄λ΄…λ‹ˆλ‹€. μ΄λ•ŒλŠ” λ‹€μŒ 두 κ°€μ§€ 섀정을 λ°˜λ“œμ‹œ μ μš©ν•΄μ•Ό ν•©λ‹ˆλ‹€.

1️⃣ model.eval()μ΄λž€?

1
model.eval()
  • λͺ¨λΈμ„ 평가 λͺ¨λ“œ(Evaluation Mode)둜 μ „ν™˜ν•©λ‹ˆλ‹€.
  • Dropout, BatchNorm λ“±μ˜ λ ˆμ΄μ–΄κ°€ ν•™μŠ΅ μ‹œμ™€λŠ” λ‹€λ₯΄κ²Œ μž‘λ™ν•˜λ„λ‘ μ„€μ •λ©λ‹ˆλ‹€.
  • 예츑 μ‹œμ—λŠ” λͺ¨λ“  λ‰΄λŸ°μ„ ν™œμš©ν•˜κ³ , BatchNorm은 μ €μž₯된 평균과 뢄산을 μ‚¬μš©ν•©λ‹ˆλ‹€.

즉, ν•™μŠ΅κ³Ό μΆ”λ‘ μ˜ λͺ¨λ“œκ°€ λ‹€λ₯΄κΈ° λ•Œλ¬Έμ—, 평가 전에 λ°˜λ“œμ‹œ model.eval()을 ν˜ΈμΆœν•΄μ•Ό μ •ν™•ν•œ μ„±λŠ₯ 평가가 κ°€λŠ₯ν•©λ‹ˆλ‹€.


2️⃣ with torch.no_grad()λž€?

1
with torch.no_grad():
  • Pytorch의 Autograd 엔진을 κΊΌμ„œ gradient 계산을 ν•˜μ§€ μ•Šλ„λ‘ μ„€μ •ν•©λ‹ˆλ‹€.
  • ν…ŒμŠ€νŠΈλ‚˜ μΆ”λ‘  μ‹œμ—λŠ” 기울기 계산이 ν•„μš” μ—†κΈ° λ•Œλ¬Έμ— λ©”λͺ¨λ¦¬μ™€ 속도 μΈ‘λ©΄μ—μ„œ νš¨μœ¨μ μž…λ‹ˆλ‹€.
  • λ˜ν•œ, GPU λ©”λͺ¨λ¦¬λ₯Ό μ ˆμ•½ν•˜κ³  μ—°μ‚° 속도λ₯Ό 높일 수 μžˆμŠ΅λ‹ˆλ‹€.

βœ… 전체 ν…ŒμŠ€νŠΈ μ½”λ“œ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
correct = 0
total = 0

model.eval()  # 평가 λͺ¨λ“œλ‘œ μ „ν™˜

with torch.no_grad():  # gradient λΉ„ν™œμ„±ν™”
    for image, label in test_loader:
        x = image.to(device)
        y = label.to(device)

        output = model(x)
        _, output_index = torch.max(output, 1)

        total += label.size(0)
        correct += (output_index == y).sum().float()

    print("Accuracy of Test Data: {:.2f}%".format(100 * correct / total))

마무리 πŸ“

이번 ν¬μŠ€νŠΈμ—μ„œλŠ” MNIST 데이터셋을 ν™œμš©ν•΄ CNN λͺ¨λΈμ„ κ΅¬μ„±ν•˜κ³  ν•™μŠ΅λΆ€ν„° ν…ŒμŠ€νŠΈκΉŒμ§€ μ „ 과정을 μ§„ν–‰ν•΄λ³΄μ•˜μŠ΅λ‹ˆλ‹€.

특히, PyTorchμ—μ„œ λͺ¨λΈμ˜ ν•™μŠ΅κ³Ό 평가 μ‹œμ μ— 따라 λ°˜λ“œμ‹œ ν˜ΈμΆœν•΄μ•Ό ν•˜λŠ” model.train(), model.eval(), torch.no_grad()의 μ˜λ―Έμ™€ 역할을 λͺ…ν™•νžˆ μ΄ν•΄ν•˜λŠ” 것이 맀우 μ€‘μš”ν•©λ‹ˆλ‹€.

μ΄λŸ¬ν•œ 기본적인 흐름을 잘 μ΅ν˜€λ‘λ©΄, ν–₯ν›„ λ³΅μž‘ν•œ λͺ¨λΈμ—μ„œλ„ 훨씬 효율적으둜 μ‹€ν—˜μ„ μ§„ν–‰ν•  수 있게 λ©λ‹ˆλ‹€ 😊

κΆκΈˆν•˜μ‹  점이 μžˆλ‹€λ©΄ λŒ“κΈ€λ‘œ λ‚¨κ²¨μ£Όμ„Έμš” πŸ™Œ

κΈ΄ κΈ€ μ½μ–΄μ£Όμ…”μ„œ κ°μ‚¬ν•©λ‹ˆλ‹€!