[ํŒŒ์ดํ† ์น˜] ํŒŒ์ดํ† ์น˜ ๊ธฐ์ดˆ ์š”์†Œ (Autograd๋ž€)

Posted by Euisuk's Dev Log on September 16, 2021

[ํŒŒ์ดํ† ์น˜] ํŒŒ์ดํ† ์น˜ ๊ธฐ์ดˆ ์š”์†Œ (Autograd๋ž€)

์›๋ณธ ๊ฒŒ์‹œ๊ธ€: https://velog.io/@euisuk-chung/ํŒŒ์ดํ† ์น˜-ํŒŒ์ดํ† ์น˜-๊ธฐ์ดˆ-์š”์†Œ-Autograd๋ž€

์ˆœ์ „ํŒŒ์™€ ์—ญ์ „ํŒŒ

์‹ ๊ฒฝ๋ง(Neural Network)์€ ์–ด๋–ค ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์‹คํ–‰๋˜๋Š” ์ค‘์ฒฉ๋œ ํ•จ์ˆ˜๋“ค์˜ ์ง‘ํ•ฉ์ฒด์ž…๋‹ˆ๋‹ค. ์‹ ๊ฒฝ๋ง์„ ์•„๋ž˜ 2๋‹จ๊ณ„๋ฅผ ๊ฑฐ์ณ ํ•™์Šต๋ฉ๋‹ˆ๋‹ค :

  1. ์ˆœ์ „ํŒŒ(Forward Propagation)
  2. ์—ญ์ „ํŒŒ(Backward Propagation)

NN-Forward-Backward

Forward Propagation (์ˆœ์ „ํŒŒ)

Forward Propagation(์ˆœ์ „ํŒŒ) ๋‹จ๊ณ„์—์„œ, ์‹ ๊ฒฝ๋ง์€ ์ •๋‹ต์„ ๋งž์ถ”๊ธฐ ์œ„ํ•ด ์ตœ์„ ์˜ ์ถ”์ธก(best guess)์„ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์ถ”์ธก์„ ํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ ํ•จ์ˆ˜๋“ค์—์„œ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

Back Propagation (์—ญ์ „ํŒŒ)

Back Propagation (์—ญ์ „ํŒŒ) ๋‹จ๊ณ„์—์„œ, ์‹ ๊ฒฝ๋ง์€ ์ถ”์ธกํ•œ ๊ฐ’์—์„œ ๋ฐœ์ƒํ•œ error์— ๋น„๋ก€ํ•˜์—ฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์„ ์ ์ ˆํžˆ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค. ์ถœ๋ ฅ(output)๋กœ๋ถ€ํ„ฐ ์—ญ๋ฐฉํ–ฅ์œผ๋กœ ์ด๋™ํ•˜๋ฉด์„œ ์˜ค๋ฅ˜์— ๋Œ€ํ•œ ํ•จ์ˆ˜๋“ค์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋“ค์˜ ๋ฏธ๋ถ„๊ฐ’(gradient)์„ ์ˆ˜์ง‘ํ•˜๊ณ , ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•(gradient descent)์„ ์‚ฌ์šฉํ•˜์—ฌ ๋งค๊ฐœ๋ณ€์ˆ˜๋“ค์„ ์ตœ์ ํ™” ํ•ฉ๋‹ˆ๋‹ค.

๋‰ด๋Ÿด๋„คํŠธ์›Œํฌ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜

  1. ๋ชจ๋“  ๊ฐ€์ค‘์น˜ w๋ฅผ ์ž„์˜๋กœ ์ƒ์„ฑ

    [Forward Propagation]

  2. ์ž…๋ ฅ๋ณ€์ˆ˜ ๊ฐ’๊ณผ ์ž…๋ ฅ์ธต๊ณผ ์€๋‹‰์ธต ์‚ฌ์ด์˜ w๊ฐ’์„ ์ด์šฉํ•˜์—ฌ ์€๋‹‰๋…ธ๋“œ์˜ ๊ฐ’์„ ๊ณ„์‚ฐ

    (์„ ํ˜•๊ฒฐํ•ฉ ํ›„ activationํ•œ ๊ฐ’)

  3. ์€๋‹‰๋…ธ๋“œ์˜ ๊ฐ’๊ณผ ์€๋‹‰์ธต๊ณผ ์ถœ๋ ฅ์ธต ์‚ฌ์ด์˜ w๊ฐ’์„ ์ด์šฉํ•˜์—ฌ ์ถœ๋ ฅ๋…ธ๋“œ์˜ ๊ฐ’์„ ๊ณ„์‚ฐ

    (์„ ํ˜•๊ฒฐํ•ฉ ํ›„ activationํ•œ ๊ฐ’)

    [Back Propagation]

  4. ๊ณ„์‚ฐ๋œ ์ถœ๋ ฅ๋…ธ๋“œ์˜ ๊ฐ’๊ณผ ์‹ค์ œ ์ถœ๋ ฅ๋ณ€์ˆ˜์˜ ๊ฐ’์˜ ์ฐจ์ด๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋„๋ก ์€๋‹‰์ธต๊ณผ ์ถœ๋ ฅ์ธต ์‚ฌ์ด์˜ w๊ฐ’์„ ์—…๋ฐ์ดํŠธ
  5. ๊ณ„์‚ฐ๋œ ์ถœ๋ ฅ๋…ธ๋“œ์˜ ๊ฐ’๊ณผ ์‹ค์ œ ์ถœ๋ ฅ๋ณ€์ˆ˜์˜ ๊ฐ’์˜ ์ฐจ์ด๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋„๋ก ์ž…๋ ฅ์ธต๊ณผ ์€๋‹‰์ธต ์‚ฌ์ด์˜ w๊ฐ’์„ ์—…๋ฐ์ดํŠธ
  6. ์—๋Ÿฌ๊ฐ€ ์ถฉ๋ถ„ํžˆ ์ค„์–ด๋“ค ๋•Œ๊นŒ์ง€ 2๋ฒˆ ~ 5๋ฒˆ์„ ๋ฐ˜๋ณต

Autograd ๊ฐœ๋…

pyTorch๋ฅผ ์ด์šฉํ•ด ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ• ๋•Œ ์ด๋Ÿฌํ•œ ์—ญ์ „ํŒŒ๋ฅผ ํ†ตํ•ด ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฐ”๋กœ Autograd ์ž…๋‹ˆ๋‹ค. ์ฐจ๊ทผ์ฐจ๊ทผ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด ์•Œ์•„๋ณด๋„๋ก ํ•ฉ์‹œ๋‹ค. Autograd์— ๋Œ€ํ•ด ์‚ดํŽด๋ณด๊ธฐ ์œ„ํ•ด ๊ฐ„๋‹จํ•œ MLP(Mulyi-Layer Perceptron)์„ ์˜ˆ์‹œ๋กœ ์‚ดํŽด๋ณผ๊นŒ์š”?

import torch

๋จผ์ € pyTorch๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด pyTorch๋ฅผ importํ•ด์ค๋‹ˆ๋‹ค. ์ด๋•Œ, torch.cuda์˜ is_available()ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ํ˜„์žฌ ํŒŒ์ด์ฌ์ด ์‹คํ–‰๋˜๊ณ  ์žˆ๋Š” ํ™˜๊ฒฝ์ด GPU๋ฅผ ์ด์šฉํ•ด์„œ ๊ณ„์‚ฐ์„ ํ• ์ˆ˜ ์žˆ๋Š”๊ฐ€๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ’ป ์ฝ”๋“œ

1
2
3
4
5
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

device

๐Ÿ’ป ๊ฒฐ๊ณผ

1
2
3
4
5
# GPU ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•  ๋•Œ
device(type='cuda')

# GPU ์‚ฌ์šฉ์ด ๋ถˆ๊ฐ€๋Šฅํ•  ๋•Œ
device(type='cuda')

BATCH_SIZE

BATCH_SIZE๋Š” ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•  ๋•Œ ๊ณ„์‚ฐ๋˜๋Š” ๋ฐ์ดํ„ฐ ๋ฌถ์Œ์˜ ๊ฐœ์ˆ˜์ž…๋‹ˆ๋‹ค. ์•ž์—์„œ Neural Network์ด Forward Propagation(์ˆœ์ „ํŒŒ)์™€ Backward Propagation(์—ญ์ „ํŒŒ)๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ฉด์„œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธ๋ฅผ ํ•œ๋‹ค๊ณ  ์†Œ๊ฐœ ๋“œ๋ ธ๋Š”๋ฐ, ์ด๋Ÿฌํ•œ ์—…๋ฐ์ดํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ ๋‹จ์œ„(๊ฐฏ์ˆ˜)๊ฐ€ ๋˜๋Š” ๊ฒƒ์ด ๋ฐ”๋กœ BATCH_SIZE์ž…๋‹ˆ๋‹ค. ์•„๋ž˜ ์˜ˆ์‹œ์—์„œ BATCH_SIZE๋กœ 32๋ฅผ ์ง€์ •ํ•ด์คฌ๋Š”๋ฐ, ์ด๋Š” ์ฝ”๋“œ ์ž‘์„ฑ์ž ๋งˆ์Œ๋Œ€๋กœ(?) ์ •ํ•ด์ฃผ๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์ž…๋‹ˆ๋‹ค.

๐Ÿ’ป ์ฝ”๋“œ

1
2
# ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์ง€์ •
BATCH_SIZE = 32

INPUT_SIZE, HIDDEN_SIZE, OUTPUT_SIZE, LEARNING_RATE

Input_Hidden_Ouput

  • INPUT_SIZE๋Š” ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ์ž…๋ ฅ๊ฐ’์˜ ํฌ๊ธฐ์ด๋ฉฐ, ์ž…๋ ฅ์ธต์˜ ๋…ธ๋“œ ์ˆ˜๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
  • HIDDEN_SIZE๋Š” ์ž…๋ ฅ๊ฐ’์— ๋‹ค์ˆ˜์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ„์‚ฐ๋˜๋Š” ๊ฐ’์˜ ๊ฐœ์ˆ˜๋กœ, ์€๋‹‰ ์ธต์˜ ๋…ธ๋“œ ์ˆ˜๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
  • OUTPUT_SIZE๋Š” ์€๋‹‰๊ฐ’์— ๋‹ค์ˆ˜์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ„์‚ฐ๋˜๋Š” ๊ฒฐ๊ณผ๊ฐ’์˜ ๊ฐœ์ˆ˜๋กœ, ์ถœ๋ ฅ ์ธต์˜ ๋…ธ๋“œ ์ˆ˜๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
  • LEARNING_RATE์€ Gradient๋ฅผ ์—…๋ฐ์ดํŠธํ•  ๋•Œ ๊ณฑํ•ด์ฃผ๋Š” 0๊ณผ 1์‚ฌ์ด์— ์กด์žฌํ•˜๋Š” ๊ฐ’์ž…๋‹ˆ๋‹ค. ์ข€ ๋” ๋А๋ฆฌ์ง€๋งŒ ์„ฌ์„ธํ•˜๊ณ  ์ด˜์ด˜ํžˆ ์—…๋ฐ์ดํŠธ๋ฅผ ์›ํ•˜๋ฉด ์ž‘์€ rate์„, ์ข€ ๋” ๋น ๋ฅด๊ฒŒ ์—…๋ฐ์ดํŠธ๋ฅผ ์›ํ•˜๋ฉด ํฐ rate๋ฅผ ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ’ป ์ฝ”๋“œ

1
2
3
4
5
# ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์ง€์ •
INPUT_SIZE = 1000
HIDDEN_SIZE = 100
OUTPUT_SIZE = 2
LEARNING_RATE = 1e-6
  • ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ ์–ธํ–ˆ์œผ๋ฉด ์‹คํ—˜์„ ํ•ด๋ด์•ผ๊ฒ ์ฃ ? ์ผ๋‹จ ์‹คํ—˜ํ™˜๊ฒฝ์„ ์œ„ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ž„์˜์˜ ๊ฐ’์œผ๋กœ input(X), output(Y), Weights(W1, W2)๋ฅผ ์ •์˜ํ•ด์ค๋‹ˆ๋‹ค.
  • ์ด๋•Œ, requires_grad=True๋Š” autograd ์— ๋ชจ๋“  ์—ฐ์‚ฐ(operation)๋“ค์„ ์ถ”์ ํ•ด์•ผ ํ•œ๋‹ค๊ณ  ์•Œ๋ ค์ค๋‹ˆ๋‹ค.

๐Ÿ’ป ์ฝ”๋“œ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# ์ž„์˜์˜ X, Y, Weight ์ •์˜
# x : input ๊ฐ’ >> (32, 1000)
x = torch.randn(BATCH_SIZE, 
                INPUT_SIZE, 
                device = device, 
                dtype = torch.float, 
                requires_grad = False)
                
# y : output ๊ฐ’ >> (32, 2)
y = torch.randn(BATCH_SIZE, 
                OUTPUT_SIZE, 
                device = device,
                dtype = torch.float, 
                requires_grad = False) 
                
# w1 : input -> hidden >> (1000, 100)
w1 = torch.randn(INPUT_SIZE, 
                 HIDDEN_SIZE, 
                 device = device, 
                 dtype = torch.float,
                 requires_grad = True)  

# w2 : hidden -> output >> (100, 2)
w2 = torch.randn(HIDDEN_SIZE,
                 OUTPUT_SIZE, 
                 device = device,
                 dtype = torch.float,
                 requires_grad = True)  

Train Model (iteration = 500)

  • ๋ณธ ํฌ์ŠคํŠธ์€ Autograd๋ฅผ ํ™•์ธํ•ด๋ณด๋Š” ํฌ์ŠคํŠธ์ด๋ฏ€๋กœ, ๋‹จ์ˆœํ•˜๊ฒŒ for๋ฌธ์„ ์ด์šฉํ•˜์—ฌ 500๋ฒˆ iteration์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • torch.mm() : mm์€ matrix multiplication์˜ ์ค„์ž„๋ง์œผ๋กœ, ํ–‰๋ ฌ์˜ ๊ณฑ์…ˆ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
  • torch.nn.ReLU() : ReLUํ•จ์ˆ˜, ReLU๋Š” max(0, x)๋ฅผ ์˜๋ฏธํ•˜๋Š” ํ•จ์ˆ˜์ธ๋ฐ, 0๋ณด๋‹ค ์ž‘์•„์ง€๊ฒŒ ๋˜๋ฉด 0์ด ๋˜๊ณ , ๊ทธ ์ด์ƒ์€ ๊ฐ’์„ ์œ ์ง€ํ•œ๋‹ค๋Š” ํŠน์ง•์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
  • loss.backward() : loss์— ๋Œ€ํ•˜์—ฌ .backward()๋ฅผ ํ˜ธ์ถœํ•œ ๊ฒƒ์œผ๋กœ, autograd๋Š” ๊ฐ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์— ๋Œ€ํ•ด ๋ฏธ๋ถ„๊ฐ’(gradient)์„ ๊ณ„์‚ฐํ•˜๊ณ  ์ด๋ฅผ ๊ฐ ํ…์„œ์˜ .grad ์†์„ฑ(attribute)์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
  • with torch.no_grad() : ๋ฏธ๋ถ„๊ฐ’(gradient) ๊ณ„์‚ฐ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋„๋ก ์„ค์ •ํ•˜๋Š” ์ปจํ…์ŠคํŠธ-๊ด€๋ฆฌ์ž(Context-manager)์ž…๋‹ˆ๋‹ค. ํ•ด๋‹น ๋ชจ๋“œ๋Š” ์ž…๋ ฅ์— requires_grad=True๊ฐ€ ์žˆ์–ด๋„, ์ด๋ฅผ requires_grad=False๋กœ ๋ฐ”๊ฟ”์ค๋‹ˆ๋‹ค.

๐Ÿ’ป ์ฝ”๋“œ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from torch import nn

# 500 iteration
for t in range(1, 501):
    # ์€๋‹‰๊ฐ’
    hidden = nn.ReLU(x.mm(w1))
    
    # ์˜ˆ์ธก๊ฐ’
    y_pred = hidden.mm(w2)
    
    # ์˜ค์ฐจ์ œ๊ณฑํ•ฉ ๊ณ„์‚ฐ
    loss = (y_pred - y).pow(2).sum()
    
    # iteration 100 ๋งˆ๋‹ค ๊ธฐ๋กํ•˜๋„๋ก
    if t % 100 == 0:
        print(t, "th Iteration: ", sep = "")
        print(">>>> Loss: ", loss.item())
    
    # Loss์˜ Gradient ๊ณ„์‚ฐ
    loss.backward()                                           
	
    # ํ•ด๋‹น ์‹œ์ ์˜ Gradient๊ฐ’์„ ๊ณ ์ •
    with torch.no_grad():
    	# Weight ์—…๋ฐ์ดํŠธ
        w1 -= LEARNING_RATE * w1.grad                          
        w2 -= LEARNING_RATE * w2.grad                          
		
        # Weight Gradient ์ดˆ๊ธฐํ™”(0)
        w1.grad.zero_()                                      
        w2.grad.zero_()
  • 500๋ฒˆ์˜ ๋ฐ˜๋ณต๋ฌธ์„ ์‹คํ–‰ํ•˜๋ฉด์„œ ์ ์  Loss๊ฐ€ ์ค„์–ด๋“œ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ’ป ๊ฒฐ๊ณผ

1
2
3
4
5
6
7
8
9
10
100th Iteration: 
>>>> Loss:  926.969116210
200th Iteration: 
>>>> Loss:  6.41975164413
300th Iteration: 
>>>> Loss:  0.06706248223
400th Iteration: 
>>>> Loss:  0.00112969405
500th Iteration: 
>>>> Loss:  0.00011484944

์‹ฌํ™” ๊ฐœ๋…

Computational Graph (์—ฐ์‚ฐ ๊ทธ๋ž˜ํ”„)

  • autograd๋Š” ๋ฐ์ดํ„ฐ(ํ…์„œ)์˜ ๋ฐ ์‹คํ–‰๋œ ๋ชจ๋“  ์—ฐ์‚ฐ๋“ค์˜ ๊ธฐ๋ก์„ ๊ฐ์ฒด๋กœ ๊ตฌ์„ฑ๋œ ๋ฐฉํ–ฅ์„ฑ ๋น„์ˆœํ™˜ ๊ทธ๋ž˜ํ”„(DAG; Directed Acyclic Graph)์— ์ €์žฅ(keep)ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฐฉํ–ฅ์„ฑ ๋น„์ˆœํ™˜ ๊ทธ๋ž˜ํ”„(DAG)์˜ NN์˜ ์ „๋ฐ˜์ ์ธ ๊ณ„์‚ฐ๊ณผ์ •์„ ๊ทธ๋ž˜ํ”„๋กœ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์œผ๋กœ, ์žŽ(leave)์€ ์ž…๋ ฅ ํ…์„œ(๋ฐ์ดํ„ฐ)์ด๊ณ , ๋ฟŒ๋ฆฌ(root)๋Š” ๊ฒฐ๊ณผ ํ…์„œ(๋ฐ์ดํ„ฐ)์ž…๋‹ˆ๋‹ค.
  • ์ด๋Ÿฌํ•œ ๋ฐฉํ–ฅ์„ฑ ๋น„์ˆœํ™˜ ๊ทธ๋ž˜ํ”„(DAG)๋ฅผ ๋ฟŒ๋ฆฌ์—์„œ๋ถ€ํ„ฐ ์žŽ๊นŒ์ง€ ์ถ”์ ํ•˜๋ฉด ์—ฐ์‡„ ๋ฒ•์น™(chain rule)์— ๋”ฐ๋ผ ๊ธฐ์šธ๊ธฐ(gradient)๋ฅผ ์ž๋™์œผ๋กœ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.

์ˆœ์ „ํŒŒ ๋‹จ๊ณ„ ์—์„œ, autograd๋Š” ์•„๋ž˜ ๋‘ ๊ฐ€์ง€ ์ž‘์—…์„ ๋™์‹œ์— ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

  1. ์š”์ฒญ๋œ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ๊ฒฐ๊ณผ ํ…์„œ๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ ,
  2. DAG์— ์—ฐ์‚ฐ์˜ gradient function์„ ์œ ์ง€(maintain)ํ•ฉ๋‹ˆ๋‹ค.

์—ญ์ „ํŒŒ ๋‹จ๊ณ„ ๋Š” DAG ๋ฟŒ๋ฆฌ(root)์—์„œ .backward() ๊ฐ€ ํ˜ธ์ถœ๋  ๋•Œ ์‹œ์ž‘๋ฉ๋‹ˆ๋‹ค. autograd ๋Š” ์•„๋ž˜ ์„ธ ๊ฐ€์ง€ ์ž‘์—…์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

  1. ๊ฐ .grad_fn์œผ๋กœ๋ถ€ํ„ฐ gradient๋ฅผ ๊ณ„์‚ฐ
  2. ๊ฐ ํ…์„œ์˜ .grad ์†์„ฑ์— ๊ณ„์‚ฐ ๊ฒฐ๊ณผ๋ฅผ ์ €์žฅ(accumulate)
  3. ์—ฐ์‡„ ๋ฒ•์น™์„ ์‚ฌ์šฉํ•˜์—ฌ, ๋ชจ๋“  ์žŽ(leaf) ํ…์„œ๋“ค๊นŒ์ง€ ์ „ํŒŒ(propagate)

Computational Graph

์ฐธ๊ณ  ๊ฐœ๋…

์ถœ์ฒ˜ : https://tutorials.pytorch.kr/beginner/nn_tutorial.html

  • ์—ฐ์‚ฐ ๊ทธ๋ž˜ํ”„์™€ autograd๋Š” ๋ณต์žกํ•œ ์—ฐ์‚ฐ์ž๋ฅผ ์ •์˜ํ•˜๊ณ  ๋„ํ•จ์ˆ˜(derivative)๋ฅผ ์ž๋™์œผ๋กœ ๊ณ„์‚ฐํ•˜๋Š” ๋งค์šฐ ๊ฐ•๋ ฅํ•œ ํŒจ๋Ÿฌ๋‹ค์ž„(paradigm)์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋Œ€๊ทœ๋ชจ ์‹ ๊ฒฝ๋ง์—์„œ๋Š” autograd ๊ทธ ์ž์ฒด๋งŒ์œผ๋กœ๋Š” ๋„ˆ๋ฌด ์ €์ˆ˜์ค€(low-level)์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

torch.nn, torch.optim

  • PyTorch๋Š” ์‹ ๊ฒฝ๋ง(neural network)๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋„์™€์ฃผ๊ธฐ ์œ„ํ•ด์„œ torch.nn, torch.optim์ด ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค.

    • torch.nn : ๋‹ค์–‘ํ•œ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ํŒจํ‚ค์ง€์ž…๋‹ˆ๋‹ค.

      1. torch.nn.Module: ํ•จ์ˆ˜์ฒ˜๋Ÿผ ๋™์ž‘ํ•˜์ง€๋งŒ, ๋˜ํ•œ ์ƒํƒœ(state)๋ฅผ ํฌํ•จํ•  ์ˆ˜ ์žˆ๋Š” ํ˜ธ์ถœ ๊ฐ€๋Šฅํ•œ ์˜ค๋ธŒ์ ํŠธ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ํฌํ•จ๋œ Parameter๋“ค์ด ์–ด๋–ค ๊ฒƒ์ธ์ง€ ์•Œ๊ณ , ๋ชจ๋“  ๊ธฐ์šธ๊ธฐ๋ฅผ 0์œผ๋กœ ์„ค์ •ํ•˜๊ณ  ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ ๋“ฑ์„ ์œ„ํ•ด ๋ฐ˜๋ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
      2. torch.nn.Parameter: Module ์— ์—ญ์ „ํŒŒ ๋™์•ˆ ์—…๋ฐ์ดํŠธ๊ฐ€ ํ•„์š”ํ•œ ๊ฐ€์ค‘์น˜๊ฐ€ ์žˆ์Œ์„ ์•Œ๋ ค์ฃผ๋Š” ํ…์„œ์šฉ ๋ž˜ํผ์ž…๋‹ˆ๋‹ค. requires_grad ์†์„ฑ์ด ์„ค์ •๋œ ํ…์„œ๋งŒ ์—…๋ฐ์ดํŠธ ๋ฉ๋‹ˆ๋‹ค.
      3. torch.nn.functional: ํ™œ์„ฑํ™” ํ•จ์ˆ˜, ์†์‹ค ํ•จ์ˆ˜ ๋“ฑ์„ ํฌํ•จํ•˜๋Š” ๋ชจ๋“ˆ์ด๊ณ , ๋ฌผ๋ก  ์ปจ๋ณผ๋ฃจ์…˜ ๋ฐ ์„ ํ˜• ๋ ˆ์ด์–ด ๋“ฑ์— ๋Œ€ํ•ด์„œ ์ƒํƒœ๋ฅผ ์ €์žฅํ•˜์ง€์•Š๋Š”(non-stateful) ๋ฒ„์ „์˜ ๋ ˆ์ด์–ด๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.
    • torch.optim: ์•ž์—์„œ๋Š” torch.no_grad()๋กœ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ฐ–๋Š” ํ…์„œ๋“ค์„ ์ง์ ‘ ์กฐ์ž‘ํ•˜์—ฌ ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜(weight)๋ฅผ ๊ฐฑ์‹ ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๊ฐ„๋‹จํ•œ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ๋Š” ํฌ๊ฒŒ ๋ถ€๋‹ด์ด ๋˜์ง€ ์•Š์ง€๋งŒ, ์‹ค์ œ๋กœ ์‹ ๊ฒฝ๋ง์„ ํ•™์Šตํ•  ๋•Œ๋Š” AdaGrad, RMSProp, Adam ๋“ฑ๊ณผ ๊ฐ™์€ ๋” ์ •๊ตํ•œ ์˜ตํ‹ฐ๋งˆ์ด์ €(optimizer)๋ฅผ ์‚ฌ์šฉํ•˜๊ณค ํ•ฉ๋‹ˆ๋‹ค. ์ด์— PyTorch์˜ optim ํŒจํ‚ค์ง€๋Š” ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋Œ€ํ•œ ์•„์ด๋””์–ด๋ฅผ ์ถ”์ƒํ™”ํ•˜๊ณ  ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ตฌํ˜„์ฒด(implementation)๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

Dataset, DataLoader

  • ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ์„ ์ฒ˜๋ฆฌํ•˜๋Š” ์ฝ”๋“œ๋Š” ์ง€์ €๋ถ„ํ•˜๊ณ  ์œ ์ง€๋ณด์ˆ˜๊ฐ€ ์–ด๋ ค์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋” ๋‚˜์€ ๊ฐ€๋…์„ฑ(readability)๊ณผ ๋ชจ๋“ˆ์„ฑ(modularity)์„ ์œ„ํ•ด ๋ฐ์ดํ„ฐ์…‹ ์ฝ”๋“œ๋ฅผ ๋ชจ๋ธ ํ•™์Šต ์ฝ”๋“œ๋กœ๋ถ€ํ„ฐ ๋ถ„๋ฆฌํ•˜๋Š” ๊ฒƒ์ด ์ด์ƒ์ ์ž…๋‹ˆ๋‹ค.
  • PyTorch๋Š” torch.utils.data.DataLoader ์™€ torch.utils.data.Dataset์˜ ๋‘ ๊ฐ€์ง€ ๋ฐ์ดํ„ฐ ๊ธฐ๋ณธ ์š”์†Œ๋ฅผ ์ œ๊ณตํ•˜์—ฌ ๋ฏธ๋ฆฌ ์ค€๋น„ํ•ด๋œ(pre-loaded) ๋ฐ์ดํ„ฐ์…‹ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

    • torch.utils.data.Dataset: ์ƒ˜ํ”Œ๊ณผ ์ •๋‹ต(label)์„ ์ €์žฅํ•˜๊ณ , len ๋ฐ getitem ์ด ์žˆ๋Š” ๊ฐ์ฒด์˜ ์ถ”์ƒ ์ธํ„ฐํŽ˜์ด์Šค์ž…๋‹ˆ๋‹ค.
    • torch.utils.data.DataLoader: ๋ชจ๋“  ์ข…๋ฅ˜์˜ Dataset์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐ์ดํ„ฐ์˜ ๋ฐฐ์น˜๋“ค์„ ์ถœ๋ ฅํ•˜๋Š” ๋ฐ˜๋ณต์ž(iterator)๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

๊ธด ๊ธ€ ์ฝ์–ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค ^~^



-->