(์„ค๋ช…์ถ”๊ฐ€) ์›จ์ดํŠธ ์ดˆ๊ธฐํ™” (Weight Initialization)

Posted by Euisuk's Dev Log on February 1, 2025

(์„ค๋ช…์ถ”๊ฐ€) ์›จ์ดํŠธ ์ดˆ๊ธฐํ™” (Weight Initialization)

์›๋ณธ ๊ฒŒ์‹œ๊ธ€: https://velog.io/@euisuk-chung/์„ค๋ช…์ถ”๊ฐ€-์›จ์ดํŠธ-์ดˆ๊ธฐํ™”-Weight-Initialization

  1. ์›จ์ดํŠธ ์ดˆ๊ธฐํ™”๋ž€?

๋”ฅ๋Ÿฌ๋‹์—์„œ ์›จ์ดํŠธ ์ดˆ๊ธฐํ™”(Weight Initialization)๋Š” ์‹ ๊ฒฝ๋ง์˜ ๊ฐ€์ค‘์น˜๋ฅผ ํ•™์Šต ์ „์— ์„ค์ •ํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค. ์ดˆ๊ธฐํ™” ๋ฐฉ์‹์— ๋”ฐ๋ผ ๋ชจ๋ธ์˜ ํ•™์Šต ์†๋„, ์„ฑ๋Šฅ, ์•ˆ์ •์„ฑ์ด ํฌ๊ฒŒ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์ ์ ˆํ•œ ์ดˆ๊ธฐํ™” ๋ฐฉ์‹์€ ํ›ˆ๋ จ์„ ๊ฐ€์†ํ™”ํ•˜๊ณ , ์ตœ์ ํ™” ๊ณผ์ •์—์„œ ์•ˆ์ •์ ์ธ ํ•™์Šต์„ ๋ณด์žฅํ•˜๋ฉฐ, ๊ทธ๋ž˜๋””์–ธํŠธ ์†Œ์‹ค ๋ฐ ํญ๋ฐœ ๋ฌธ์ œ๋ฅผ ๋ฐฉ์ง€ํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

ํ•ด๋‹น ๊ด€๋ จ ํ˜ํŽœํ•˜์ž„์˜ ๊ฐ•์˜๊ฐ€ ๊ถ๊ธˆํ•˜๋‹ค๋ฉด?

์•„๋ž˜๋Š” ์œ„ ๊ฐ•์˜ ๋‚ด์šฉ๊ณผ ์ œ๊ฐ€ ์ถ”๊ฐ€์ ์œผ๋กœ ์กฐ์‚ฌ ๋ฐ ์ •๋ฆฌํ•œ ๋‚ด์šฉ์„ ๋ฐ”ํƒ•์œผ๋กœ ์ž‘์„ฑํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.

์›จ์ดํŠธ ์ดˆ๊ธฐํ™”์˜ ๋ชฉํ‘œ

์‹ ๊ฒฝ๋ง์—์„œ ๋‰ด๋Ÿฐ jjj์˜ ์ถœ๋ ฅ(ํ™œ์„ฑํ™” ๊ฐ’) zjz_jzjโ€‹๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค.

zj=โˆ‘i=1Ninwijxi+bjz_j = \sum_{i=1}^{N_{in}} w_{ij} x_i + b_jzjโ€‹=i=1โˆ‘Ninโ€‹โ€‹wijโ€‹xiโ€‹+bjโ€‹

์—ฌ๊ธฐ์„œ, ๊ฐ๊ฐ xxx, www, bbb๋Š” ์•„๋ž˜๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

  • xix_ixiโ€‹๋Š” ์ด์ „ ์ธต์˜ ์ถœ๋ ฅ(๋˜๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ)
  • wijw_{ij}wijโ€‹๋Š” ๊ฐ€์ค‘์น˜
  • bjb_jbjโ€‹๋Š” ํŽธํ–ฅ(bias)

ํ™œ์„ฑํ™” ํ•จ์ˆ˜ fff๋ฅผ ์ ์šฉํ•œ ์ถœ๋ ฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

aj=ฯƒ(zj)a_j = \sigma(z_j)ajโ€‹=ฯƒ(zjโ€‹)

  • zjz_jzjโ€‹๋Š” ์ด์ „ ์ธต์˜ ๋‰ด๋Ÿฐ ์ถœ๋ ฅ์„ ๊ฐ€์ค‘์น˜์™€ ๊ณฑํ•œ ํ›„ ํŽธํ–ฅ์„ ๋”ํ•œ ๊ฐ’ (=์„ ํ˜• ๋ณ€ํ™˜ ๊ฒฐ๊ณผ)
  • ฯƒฯƒฯƒ๋Š” ์‹œ๊ทธ๋ชจ์ด๋“œ, ๋˜๋Š” ๋‹ค๋ฅธ ํ™œ์„ฑํ™” ํ•จ์ˆ˜

์šฐ๋ฆฌ๋Š” ๊ฐ ์ธต์—์„œ์˜ ์ถœ๋ ฅ ๋ถ„์‚ฐ์„ ์ผ์ •ํ•˜๊ฒŒ ์œ ์ง€ํ•˜๋Š” ์›จ์ดํŠธ(๊ฐ€์ค‘์น˜) ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์„ ์ฐพ๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

์•„๋ž˜ ์†Œ์ฑ•ํ„ฐ์—์„œ ์™œ ์ด๋Ÿฌํ•œ ์›จ์ดํŠธ(๊ฐ€์ค‘์น˜)๊ฐ€ ์ค‘์š”ํ•œ์ง€ ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์™œ ์›จ์ดํŠธ ์ดˆ๊ธฐํ™”๊ฐ€ ์ค‘์š”ํ•œ๊ฐ€?

  1. ๊ทธ๋ž˜๋””์–ธํŠธ ์†Œ์‹ค ๋ฐ ํญ๋ฐœ ๋ฌธ์ œ ๋ฐฉ์ง€
    • ๋ถ€์ ์ ˆํ•œ ์ดˆ๊ธฐํ™”๋Š” ์—ญ์ „ํŒŒ ๊ณผ์ •์—์„œ ๊ทธ๋ž˜๋””์–ธํŠธ๊ฐ€ 0์œผ๋กœ ์ˆ˜๋ ดํ•˜๊ฑฐ๋‚˜, ๋„ˆ๋ฌด ์ปค์ง€๋Š” ๋ฌธ์ œ๋ฅผ ์œ ๋ฐœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  2. ํšจ์œจ์ ์ธ ํ•™์Šต ๊ฐ€๋Šฅ
    • ์ดˆ๊ธฐ๊ฐ’์ด ์ ์ ˆํ•˜๋ฉด ํ•™์Šต์ด ๋น ๋ฅด๊ฒŒ ์ง„ํ–‰๋˜๋ฉฐ, ์ตœ์ ์˜ ์†์‹ค๊ฐ’์„ ์ฐพ๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.
  3. ํ™œ์„ฑํ™” ํ•จ์ˆ˜์˜ ํŠน์„ฑ ์œ ์ง€
    • ๊ฐ ์ธต์˜ ์ถœ๋ ฅ๊ฐ’์ด ํ™œ์„ฑํ™” ํ•จ์ˆ˜์˜ ์ ์ ˆํ•œ ๋ฒ”์œ„๋ฅผ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค๋‹ˆ๋‹ค.
  4. ์ดˆ๊ธฐ ์ˆ˜๋ ด ์†๋„ ํ–ฅ์ƒ
    • ์ ์ ˆํ•œ ์ดˆ๊ธฐํ™”๋Š” ํ•™์Šต ์ดˆ๊ธฐ์— ์†์‹ค ํ•จ์ˆ˜์˜ ๊ฐ์†Œ๋ฅผ ์ด‰์ง„ํ•˜์—ฌ ๋” ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
  5. ๋„คํŠธ์›Œํฌ ์ดˆ๊ธฐํ™”๋ž€?

๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ๋„คํŠธ์›Œํฌ ์ดˆ๊ธฐํ™”(Network Initialization)๋Š” ๋ชจ๋ธ ๋‚ด ๋ชจ๋“  ๋ ˆ์ด์–ด์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค. ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์„ ์‹ ์ค‘ํ•˜๊ฒŒ ์„ ํƒํ•˜์ง€ ์•Š์œผ๋ฉด ํ•™์Šต์ด ๋น„ํšจ์œจ์ ์œผ๋กœ ์ง„ํ–‰๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • MLP (Multi-Layer Perceptron): ๋ชจ๋“  Fully Connected Layer (์„ ํ˜•์ธต) ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์ดˆ๊ธฐํ™”
  • CNN (Convolutional Neural Networks): ํ•ฉ์„ฑ๊ณฑ ํ•„ํ„ฐ(Convolution Kernel) ์ „๋ถ€ ์ดˆ๊ธฐํ™”
  • RNN (Recurrent Neural Networks): ์ˆœํ™˜์ธต(Recurrent Layers) ๋ฐ ์—ฐ๊ฒฐ ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”
  • Transformer ๊ณ„์—ด ๋ชจ๋ธ: ๋ชจ๋“  Self-Attention Layer์™€ FFN Layer ์ดˆ๊ธฐํ™”

๋”ฐ๋ผ์„œ, ๋„คํŠธ์›Œํฌ ์ดˆ๊ธฐํ™”๋Š” ๋‹จ์ˆœํžˆ ์ผ๋ถ€ ๊ฐ€์ค‘์น˜๋ฅผ ๋žœ๋คํ•˜๊ฒŒ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๋ชจ๋“  ๋ ˆ์ด์–ด์˜ ๊ฐ€์ค‘์น˜๋ฅผ ํŠน์ • ๋ฐฉ์‹์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•˜๋Š” ๊ณผ์ •์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ถ€ ์ดˆ๊ธฐํ™” ๊ธฐ๋ฒ•์€ ํŠน์ • ํ™œ์„ฑํ™” ํ•จ์ˆ˜์™€ ์กฐํ•ฉํ•˜์—ฌ ์‚ฌ์šฉํ•ด์•ผ ์ตœ์ ์˜ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ’ป (์ฐธ๊ณ ) ๋ถ„์‚ฐ ์ฆ๊ฐ€์™€ ์‹œ๊ทธ๋ชจ์ด๋“œ์˜ ํฌํ™” ํ˜„์ƒ

์‹ ๊ฒฝ๋ง์—์„œ ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”๊ฐ€ ์ค‘์š”ํ•œ ์ด์œ  ์ค‘ ํ•˜๋‚˜๋Š” ํ™œ์„ฑํ™” ํ•จ์ˆ˜์˜ ํŠน์„ฑ๊ณผ ํ•™์Šต ์•ˆ์ •์„ฑ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ์‹œ๊ทธ๋ชจ์ด๋“œ(Sigmoid)์™€ ๊ฐ™์€ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ, ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”๊ฐ€ ์ ์ ˆํ•˜์ง€ ์•Š์œผ๋ฉด ๋ถ„์‚ฐ์ด ์ปค์ ธ ํฌํ™”(Saturation) ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

1. ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜์˜ ํŠน์„ฑ

์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜๋Š” ์ž…๋ ฅ๊ฐ’ zzz์— ๋Œ€ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค.

ฯƒ(z)=11+eโˆ’z\sigma(z) = \frac{1}{1 + e^{-z}}ฯƒ(z)=1+eโˆ’z1โ€‹

์ด ํ•จ์ˆ˜์˜ ์ค‘์š”ํ•œ ํŠน์„ฑ์€:

  • zzz๊ฐ€ ๋งค์šฐ ํฌ๋ฉด ฯƒ(z)โ‰ˆ1\sigma(z) \approx 1ฯƒ(z)โ‰ˆ1
  • zzz๊ฐ€ ๋งค์šฐ ์ž‘์œผ๋ฉด ฯƒ(z)โ‰ˆ0\sigma(z) \approx 0ฯƒ(z)โ‰ˆ0
  • z=0z = 0z=0์ผ ๋•Œ ฯƒ(0)=0.5\sigma(0) = 0.5ฯƒ(0)=0.5

์ฆ‰, ์ž…๋ ฅ๊ฐ’์ด ๋„ˆ๋ฌด ํฌ๊ฑฐ๋‚˜ ์ž‘์•„์ง€๋ฉด ์‹œ๊ทธ๋ชจ์ด๋“œ์˜ ์ถœ๋ ฅ์ด 0 ๋˜๋Š” 1๋กœ ์ˆ˜๋ ดํ•˜๋ฉด์„œ ๋ฏธ๋ถ„๊ฐ’์ด 0์— ๊ฐ€๊นŒ์›Œ์ง€๋Š” ํฌํ™” ํ˜„์ƒ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

2. ๋ถ„์‚ฐ ์ฆ๊ฐ€์™€ ํฌํ™” ํ˜„์ƒ

์ถœ๋ ฅ ๋‰ด๋Ÿฐ์˜ ํ™œ์„ฑํ™” ๊ฐ’ zjz_jzjโ€‹๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค.

zj=โˆ‘i=1Ninwijxi+bjz_j = \sum_{i=1}^{N_{in}} w_{ij} x_i + b_jzjโ€‹=i=1โˆ‘Ninโ€‹โ€‹wijโ€‹xiโ€‹+bjโ€‹

์—ฌ๊ธฐ์„œ xix_ixiโ€‹๋Š” ์ž…๋ ฅ๊ฐ’์ด๊ณ , wijw_{ij}wijโ€‹๋Š” ๊ฐ€์ค‘์น˜์ž…๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์ž…๋ ฅ๊ฐ’๊ณผ ๊ฐ€์ค‘์น˜๊ฐ€ ๋…๋ฆฝ์ ์ธ ์ •๊ทœ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค๋ฉด, ์ค‘์•™๊ทนํ•œ์ •๋ฆฌ(CLT)์— ์˜ํ•ด zjz_jzjโ€‹๋„ ์ •๊ทœ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

zjโˆผN(0,Ninโ‹…ฯƒx2โ‹…ฯƒw2)z_j \sim \mathcal{N}(0, N_{in} \cdot \sigma_x^2 \cdot \sigma_w^2)zjโ€‹โˆผN(0,Ninโ€‹โ‹…ฯƒx2โ€‹โ‹…ฯƒw2โ€‹)

์ฆ‰, ์ž…๋ ฅ ๊ฐœ์ˆ˜ NinN_{in}Ninโ€‹์ด ์ปค์งˆ์ˆ˜๋ก zjz_jzjโ€‹์˜ ๋ถ„์‚ฐ์ด ์ฆ๊ฐ€ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

  • ์ด ๊ฒฝ์šฐ, zjz_jzjโ€‹ ๊ฐ’์ด ๊ทน๋‹จ์ ์œผ๋กœ ์ปค์งˆ ํ™•๋ฅ ์ด ๋†’์•„์ง€๊ณ , ์‹œ๊ทธ๋ชจ์ด๋“œ ์ถœ๋ ฅ๊ฐ’์ด 0 ๋˜๋Š” 1๋กœ ์ˆ˜๋ ดํ•˜๋Š” ํฌํ™” ์˜์—ญ์— ๋“ค์–ด๊ฐ€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

3. ํฌํ™” ํ˜„์ƒ์ด ๋ฐœ์ƒํ•˜๋ฉด?

  • Vanishing Gradient(๊ธฐ์šธ๊ธฐ ์†Œ์‹ค)

    • ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„์€ ฯƒโ€ฒ(z)=ฯƒ(z)(1โˆ’ฯƒ(z))\sigmaโ€™(z) = \sigma(z) (1 - \sigma(z))ฯƒโ€ฒ(z)=ฯƒ(z)(1โˆ’ฯƒ(z))์ด๋ฏ€๋กœ, ฯƒ(z)โ‰ˆ0\sigma(z) \approx 0ฯƒ(z)โ‰ˆ0 ๋˜๋Š” ฯƒ(z)โ‰ˆ1\sigma(z) \approx 1ฯƒ(z)โ‰ˆ1์ผ ๋•Œ ๋ฏธ๋ถ„๊ฐ’์ด 0์— ๊ฐ€๊นŒ์›Œ์ง‘๋‹ˆ๋‹ค.
    • ์ด๋กœ ์ธํ•ด ์—ญ์ „ํŒŒ(Backpropagation) ๊ณผ์ •์—์„œ ๊ธฐ์šธ๊ธฐ๊ฐ€ ์†Œ์‹ค๋˜์–ด ์‹ ๊ฒฝ๋ง ํ•™์Šต์ด ์–ด๋ ค์›Œ์ง‘๋‹ˆ๋‹ค.
  • ํ•™์Šต ์†๋„ ์ €ํ•˜

    • ๋Œ€๋ถ€๋ถ„์˜ ๋‰ด๋Ÿฐ์ด 0 ๋˜๋Š” 1๋กœ ๊ณ ์ •๋˜๋ฉด์„œ, ๋‰ด๋Ÿฐ์ด ํ™œ์„ฑํ™”๋˜์ง€ ์•Š๋Š” ์ฃฝ์€ ๋‰ด๋Ÿฐ ๋ฌธ์ œ(Dead Neuron Problem)๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

4. ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•: ์ ์ ˆํ•œ ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ๊ธฐ๋ฒ•

์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ๊ฐ€์ค‘์น˜์˜ ์ดˆ๊ธฐ ๋ถ„์‚ฐ์„ ์กฐ์ ˆํ•˜๋Š” ๋‹ค์–‘ํ•œ ์ดˆ๊ธฐํ™” ๊ธฐ๋ฒ•์ด ์ œ์•ˆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜๋ฉด ๋ถ„์‚ฐ์ด ๋„ˆ๋ฌด ์ปค์ง€์ง€ ์•Š๋„๋ก ์กฐ์ •ํ•˜์—ฌ ํฌํ™” ๋ฌธ์ œ๋ฅผ ๋ฐฉ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • LeCun ์ดˆ๊ธฐํ™” (wโˆผN(0,1Nin)w \sim N(0, \frac{1}{N_{in}})wโˆผN(0,Ninโ€‹1โ€‹))
    • ์ฃผ๋กœ Sigmoid, Tanh ๊ฐ™์€ ๋น„์„ ํ˜• ํ™œ์„ฑํ™” ํ•จ์ˆ˜์— ์‚ฌ์šฉ๋จ.
  • Xavier (Glorot) ์ดˆ๊ธฐํ™” (wโˆผN(0,2Nin+Nout)w \sim N(0, \frac{2}{N_{in} + N_{out}})wโˆผN(0,Ninโ€‹+Noutโ€‹2โ€‹))
    • ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ์˜ ๊ฐœ์ˆ˜๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ •ํ•˜์—ฌ ์•ˆ์ •์ ์ธ ํ•™์Šต์„ ์œ ๋„.
  • He (Kaiming) ์ดˆ๊ธฐํ™” (wโˆผN(0,2Nin)w \sim N(0, \frac{2}{N_{in}})wโˆผN(0,Ninโ€‹2โ€‹))
    • ReLU ๊ณ„์—ด ํ™œ์„ฑํ™” ํ•จ์ˆ˜์— ์ ํ•ฉํ•˜๋ฉฐ, ์ž…๋ ฅ ๊ฐœ์ˆ˜๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ๋ถ„์‚ฐ์„ ์กฐ์ •.

๐Ÿ’Œ ์‹ ๊ฒฝ๋ง ํ•™์Šต์˜ ์•ˆ์ •์„ฑ์„ ์œ„ํ•ด ์ ์ ˆํ•œ ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ž˜๋ชป๋œ ์ดˆ๊ธฐํ™”๋Š” ๋ถ„์‚ฐ ์ฆ๊ฐ€(Variance Explosion) ๋˜๋Š” ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค(Vanishing Gradient) ๋ฌธ์ œ๋ฅผ ์ผ์œผํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ๊ฐ€์ค‘์น˜๋ฅผ ์ ์ ˆํžˆ ์ดˆ๊ธฐํ™”ํ•˜์—ฌ ์ˆœ์ „ํŒŒ(Forward Propagation)์™€ ์—ญ์ „ํŒŒ(Backward Propagation)์—์„œ ์‹ ํ˜ธ์˜ ๋ถ„์‚ฐ์„ ์ผ์ •ํ•˜๊ฒŒ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

์ด๋Š” ์•„๋ž˜ 3. ์ฃผ์š” ์›จ์ดํŠธ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์—์„œ ์ž์„ธํ•˜๊ฒŒ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  1. ์ฃผ์š” ์›จ์ดํŠธ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•

3.0 PyTorch ๋ฐ Keras ์ดˆ๊ธฐํ™” ๋ชจ๋“ˆ

  • PyTorch์˜ torch.nn.init๊ณผ Keras์˜ tensorflow.keras.initializers ๋ชจ๋“ˆ์€ ๋‹ค์–‘ํ•œ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ™œ์šฉํ•˜๋ฉด ์‹ ๊ฒฝ๋ง์˜ ๊ฐ ์ธต์— ๋งž๋Š” ์ ์ ˆํ•œ ์ดˆ๊ธฐํ™”๋ฅผ ์‰ฝ๊ฒŒ ์ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ“Œ PyTorch ์ดˆ๊ธฐํ™” ๋ชจ๋“ˆ

1
2
3
4
5
6
7
8
9
import torch.nn.init as init

# ๊ธฐ๋ณธ์ ์ธ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•
init.uniform_(tensor)  # ๊ท ๋“ฑ ๋ถ„ํฌ ์ดˆ๊ธฐํ™”
init.normal_(tensor)   # ์ •๊ทœ ๋ถ„ํฌ ์ดˆ๊ธฐํ™”
init.xavier_uniform_(tensor)  # Xavier ๊ท ๋“ฑ ์ดˆ๊ธฐํ™”
init.kaiming_uniform_(tensor, nonlinearity='relu')  # He ์ดˆ๊ธฐํ™”
init.orthogonal_(tensor)  # ์ง๊ต ์ดˆ๊ธฐํ™”
init.constant_(tensor, 0)  # ์ƒ์ˆ˜ ์ดˆ๊ธฐํ™”

๐Ÿ“Œ Keras ์ดˆ๊ธฐํ™” ๋ชจ๋“ˆ

1
2
3
4
5
from tensorflow.keras.initializers import *

initializer = GlorotUniform()  # Xavier ์ดˆ๊ธฐํ™”
initializer = HeNormal()  # He ์ดˆ๊ธฐํ™”
initializer = Orthogonal(gain=1.0)  # ์ง๊ต ์ดˆ๊ธฐํ™”

3.1 LeCun ์ดˆ๊ธฐํ™”

  • ์ œ์•ˆ์ž: Yann LeCun (1998)
  • ์‚ฌ์šฉ ํ™œ์„ฑํ™” ํ•จ์ˆ˜: Sigmoid, Tanh, SELU
  • ์ดˆ๊ธฐํ™” ๊ณต์‹:wโˆผN(0,1Nin)w \sim N(0, \frac{1}{N_{in}})wโˆผN(0,Ninโ€‹1โ€‹)
    • NinN_{in}Ninโ€‹: ์ž…๋ ฅ ๋…ธ๋“œ ์ˆ˜

LeCun ์ดˆ๊ธฐํ™”๋Š” sigmoid ๋ฐ tanh ํ™œ์„ฑํ™” ํ•จ์ˆ˜์— ์ ํ•ฉํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

(์ฐธ๊ณ ) sigmoid ๋ฐ tanh ํ™œ์„ฑํ™” ํ•จ์ˆ˜

Sigmoid ํ•จ์ˆ˜

ฯƒ(z)=11+eโˆ’z\sigma(z) = \frac{1}{1 + e^{-z}}ฯƒ(z)=1+eโˆ’z1โ€‹

  • ์ถœ๋ ฅ ๋ฒ”์œ„: (0, 1)
  • ๊ธฐ์šธ๊ธฐ(๋ฏธ๋ถ„): ฯƒโ€ฒ(z)=ฯƒ(z)(1โˆ’ฯƒ(z))\sigmaโ€™(z) = \sigma(z) (1 - \sigma(z))ฯƒโ€ฒ(z)=ฯƒ(z)(1โˆ’ฯƒ(z))
  • ๋ถ„์„:
    • zzz ๊ฐ’์ด ํฌ๋ฉด ฯƒ(z)โ‰ˆ1\sigma(z) \approx 1ฯƒ(z)โ‰ˆ1, zzz ๊ฐ’์ด ์ž‘์œผ๋ฉด ฯƒ(z)โ‰ˆ0\sigma(z) \approx 0ฯƒ(z)โ‰ˆ0์ด ๋˜์–ด ์ถœ๋ ฅ์ด ๊ฑฐ์˜ ๋ณ€ํ•˜์ง€ ์•Š์Œ (ํฌํ™”).
    • ํฌํ™”๋œ ์˜์—ญ์—์„œ๋Š” ๊ธฐ์šธ๊ธฐ ฯƒโ€ฒ(z)\sigmaโ€™(z)ฯƒโ€ฒ(z)๊ฐ€ ๊ฑฐ์˜ 0์ด ๋˜์–ด ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค(Vanishing Gradient) ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒ.

Tanh ํ•จ์ˆ˜

tanhโก(z)=ezโˆ’eโˆ’zez+eโˆ’z\tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}tanh(z)=ez+eโˆ’zezโˆ’eโˆ’zโ€‹

  • ์ถœ๋ ฅ ๋ฒ”์œ„: (-1, 1)
  • ๊ธฐ์šธ๊ธฐ(๋ฏธ๋ถ„): tanhโกโ€ฒ(z)=1โˆ’tanhโก2(z)\tanhโ€™(z) = 1 - \tanh^2(z)tanhโ€ฒ(z)=1โˆ’tanh2(z)
  • ๋ถ„์„:
    • โˆฃzโˆฃ z โˆฃzโˆฃ ๊ฐ’์ด ํฌ๋ฉด tanhโก(z)โ‰ˆยฑ1\tanh(z) \approx \pm1tanh(z)โ‰ˆยฑ1๋กœ ์ˆ˜๋ ดํ•˜์—ฌ ๊ธฐ์šธ๊ธฐ tanhโกโ€ฒ(z)โ†’0\tanhโ€™(z) \to 0tanhโ€ฒ(z)โ†’0.
    • ์—ญ์‹œ ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒ.
๐Ÿ’ฌ ์œ„์—์„œ ์‚ดํŽด๋ณผ ์ˆ˜ ์žˆ๋‹ค์‹œํ”ผ, Sigmoid๋Š” ์ค‘์•™์—์„œ ์ตœ๋Œ€ ๊ธฐ์šธ๊ธฐ๊ฐ€ 0.25๋กœ ์ œํ•œ๋˜๋ฉฐ, Tanh๋Š” z=0z = 0z=0์—์„œ 1์ด์ง€๋งŒ โˆฃzโˆฃ z โˆฃzโˆฃ๊ฐ€ ์ฆ๊ฐ€ํ• ์ˆ˜๋ก ๊ธฐ์šธ๊ธฐ๊ฐ€ 0์— ์ˆ˜๋ ดํ•˜๊ธฐ ๋•Œ๋ฌธ์—,
  • ์ดˆ๊ธฐ ๊ฐ€์ค‘์น˜๊ฐ€ ๋„ˆ๋ฌด ํฌ๋ฉด ๋ถ„์‚ฐ ์ฆ๊ฐ€๋กœ ์ธํ•ด ํ™œ์„ฑํ™” ๊ฐ’์ด ํฌํ™” ์ƒํƒœ์— ๋„๋‹ฌํ•˜์—ฌ ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋ฐ˜๋Œ€๋กœ, ์ดˆ๊ธฐ ๊ฐ€์ค‘์น˜๊ฐ€ ๋„ˆ๋ฌด ์ž‘์œผ๋ฉด ๋ชจ๋“  ๋‰ด๋Ÿฐ์˜ ์ถœ๋ ฅ์ด ๊ฑฐ์˜ 0 ๋˜๋Š” -1 ๊ทผ์ฒ˜๋กœ ๋ชฐ๋ฆฌ๋ฉด์„œ ๊ธฐ์šธ๊ธฐ ์—ญ์‹œ ์ž‘์•„์ ธ ํ•™์Šต์ด ๋น„ํšจ์œจ์ ์œผ๋กœ ์ด๋ฃจ์–ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

zjz_jzjโ€‹์˜ ๋ถ„์‚ฐ ์ฆ๊ฐ€

  • ์‹ ๊ฒฝ๋ง์—์„œ ๋‰ด๋Ÿฐ์˜ ํ™œ์„ฑํ™” ๊ฐ’ zjz_jzjโ€‹๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค.

    zj=โˆ‘i=1Ninwijxi+bjz_j = \sum_{i=1}^{N_{in}} w_{ij} x_i + b_jzjโ€‹=i=1โˆ‘Ninโ€‹โ€‹wijโ€‹xiโ€‹+bjโ€‹

  • ์—ฌ๊ธฐ์„œ ๊ฐ€์ค‘์น˜ wijw_{ij}wijโ€‹๊ฐ€ ํฐ ๋ถ„์‚ฐ์„ ๊ฐ€์ง€๋ฉด, zjz_jzjโ€‹์˜ ๋ถ„์‚ฐ๋„ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

    • ๋งŒ์•ฝ wijโˆผN(0,ฯƒw2)w_{ij} \sim \mathcal{N}(0, \sigma_w^2)wijโ€‹โˆผN(0,ฯƒw2โ€‹)์ด๋ผ๋ฉด:Var[zj]=Ninโ‹…Var[w]โ‹…Var[x]\text{Var}[z_j] = N_{in} \cdot \text{Var}[w] \cdot \text{Var}[x]Var[zjโ€‹]=Ninโ€‹โ‹…Var[w]โ‹…Var[x]
  • ์ฆ‰, ๊ฐ€์ค‘์น˜์˜ ๋ถ„์‚ฐ์ด ํด์ˆ˜๋ก zjz_jzjโ€‹์˜ ๋ถ„์‚ฐ๋„ ์ปค์ง€๊ฒŒ ๋จ.

๐Ÿ’ก LeCun ์ดˆ๊ธฐํ™”๊ฐ€ ์ด๋ฅผ ์–ด๋–ป๊ฒŒ ๋ฐฉ์ง€ํ•˜๋Š”๊ฐ€?

  • LeCun ์ดˆ๊ธฐํ™”๋Š” ๊ฐ€์ค‘์น˜์˜ ๋ถ„์‚ฐ์„ ์ž‘๊ฒŒ ์„ค์ •ํ•˜์—ฌ zjz_jzjโ€‹๊ฐ€ ํฌํ™” ์˜์—ญ์— ๋„๋‹ฌํ•˜์ง€ ์•Š๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
  • LeCun ์ดˆ๊ธฐํ™”๋Š” ์ž‘์€ ๋ถ„์‚ฐ( 1Nin\frac{1}{N_{in}}Ninโ€‹1โ€‹ )์„ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๊ทธ๋ชจ์ด๋“œ ๋ฐ Tanh์˜ ํฌํ™” ๋ฌธ์ œ๋ฅผ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค.

    • LeCun ์ดˆ๊ธฐํ™”๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด โ†’ zzz ๊ฐ’์ด 0 ๊ทผ์ฒ˜์— ๋จธ๋ฌด๋ฆ„ โ†’ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๊ฐ€ ๊ธฐ์šธ๊ธฐ๋ฅผ ์œ ์ง€ํ•˜์—ฌ ์•ˆ์ •์ ์œผ๋กœ ํ•™์Šต๋จ
  • LeCun ์ดˆ๊ธฐํ™” (2์ข…๋ฅ˜)

    • ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋Š” ๊ฒฝ์šฐ:wijโˆผN(0,1Nin)w_{ij} \sim \mathcal{N}(0, \frac{1}{N_{in}})wijโ€‹โˆผN(0,Ninโ€‹1โ€‹)
    • ๊ท ๋“ฑ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋Š” ๊ฒฝ์šฐ:wijโˆผU(โˆ’1Nin,1Nin)w_{ij} \sim \mathcal{U} \left(-\frac{1}{\sqrt{N_{in}}}, \frac{1}{\sqrt{N_{in}}} \right)wijโ€‹โˆผU(โˆ’Ninโ€‹โ€‹1โ€‹,Ninโ€‹โ€‹1โ€‹)

์•„๋ž˜๋Š” ํŒŒ์ดํ† ์น˜์™€ ์ผ€๋ผ์Šค์— ์ ์šฉํ•œ ์˜ˆ์‹œ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

๐Ÿ“Œ ์ ์šฉ ์˜ˆ์‹œ (PyTorch)

1
2
3
4
5
6
7
8
9
import torch.nn as nn
import torch.nn.init as init

def init_weights(m):
    if isinstance(m, nn.Linear):
        init.normal_(m.weight, mean=0, std=1 / m.in_features)
        init.zeros_(m.bias)

model.apply(init_weights)

๐Ÿ“Œ ์ ์šฉ ์˜ˆ์‹œ (Keras)

1
2
3
4
from tensorflow.keras.layers import Dense
from tensorflow.keras.initializers import LecunNormal

layer = Dense(64, activation='selu', kernel_initializer=LecunNormal())

3.2 Xavier (Glorot) ์ดˆ๊ธฐํ™”

  • ์ œ์•ˆ์ž: Xavier Glorot & Yoshua Bengio (2010)
  • ์‚ฌ์šฉ ํ™œ์„ฑํ™” ํ•จ์ˆ˜: Sigmoid, Tanh
  • ์ดˆ๊ธฐํ™” ๊ณต์‹:wโˆผN(0,2Nin+Nout)w \sim N(0, \frac{2}{N_{in} + N_{out}})wโˆผN(0,Ninโ€‹+Noutโ€‹2โ€‹)
    • NinN_{in}Ninโ€‹: ์ž…๋ ฅ ๋…ธ๋“œ ์ˆ˜
    • NoutN_{out}Noutโ€‹: ์ถœ๋ ฅ ๋…ธ๋“œ ์ˆ˜

(์ฐธ๊ณ ) sigmoid ๋ฐ tanh ํ™œ์„ฑํ™” ํ•จ์ˆ˜

  • LeCun ์ดˆ๊ธฐํ™”๋Š” ์ˆœ์ „ํŒŒ์—์„œ zzz ๊ฐ’์ด ๋„ˆ๋ฌด ์ปค์ง€๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ€์ค‘์น˜ ๋ถ„์‚ฐ์„ 1Nin\frac{1}{N_{in}}Ninโ€‹1โ€‹์œผ๋กœ ์„ค์ •.

    • ๊ทธ๋Ÿฌ๋‚˜ ์—ญ์ „ํŒŒ ์‹œ ๊ธฐ์šธ๊ธฐ(gradient) ๋ถ„์‚ฐ๊นŒ์ง€ ๊ณ ๋ คํ•˜์ง€ ์•Š์Œ.
  • Xavier ์ดˆ๊ธฐํ™”๋Š” ์—ญ์ „ํŒŒ ๊ณผ์ •์—์„œ๋„ ๊ธฐ์šธ๊ธฐ์˜ ๋ถ„์‚ฐ์ด ์ผ์ •ํ•˜๊ฒŒ ์œ ์ง€๋˜๋„๋ก ๊ฐœ์„ ํ•˜์˜€์œผ๋ฉฐ,

    • ์ถœ๋ ฅ ๋…ธ๋“œ ๊ฐœ์ˆ˜ NoutN_{out}Noutโ€‹๊นŒ์ง€ ๊ณ ๋ คํ•˜์—ฌ ๋ถ„์‚ฐ์„ ์ค„์ž„.
    • ์ฆ‰, ์‹ ํ˜ธ๊ฐ€ ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ด ์ „ํŒŒ๋  ๋•Œ ์†์‹ค๋˜์ง€ ์•Š๊ณ  ์•ˆ์ •์ ์œผ๋กœ ์œ ์ง€๋˜๋„๋ก ์„ค๊ณ„๋จ.

๐Ÿ’ก ์ฆ‰, Xavier ์ดˆ๊ธฐํ™”๋Š” LeCun ์ดˆ๊ธฐํ™”์˜ ๊ฐœ๋…์„ ํ™•์žฅํ•˜์—ฌ, ์ˆœ์ „ํŒŒ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์—ญ์ „ํŒŒ์—์„œ๋„ ๋ถ„์‚ฐ์ด ์ผ์ •ํ•˜๊ฒŒ ์œ ์ง€๋˜๋„๋ก ๊ฐœ์„ ํ•œ ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

์•„๋ž˜๋Š” ํŒŒ์ดํ† ์น˜์™€ ์ผ€๋ผ์Šค์— ์ ์šฉํ•œ ์˜ˆ์‹œ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

๐Ÿ“Œ ์ ์šฉ ์˜ˆ์‹œ (PyTorch)

1
2
3
4
5
6
def init_weights(m):
    if isinstance(m, nn.Linear):
        init.xavier_uniform_(m.weight)
        init.zeros_(m.bias)

model.apply(init_weights)

๐Ÿ“Œ ์ ์šฉ ์˜ˆ์‹œ (Keras)

1
2
3
4
from tensorflow.keras.layers import Dense
from tensorflow.keras.initializers import GlorotUniform

layer = Dense(64, activation='tanh', kernel_initializer=GlorotUniform())

3.3 He (Kaiming) ์ดˆ๊ธฐํ™”

  • ์ œ์•ˆ์ž: Kaiming He (2015)
  • ์‚ฌ์šฉ ํ™œ์„ฑํ™” ํ•จ์ˆ˜: ReLU, Leaky ReLU
  • ์ดˆ๊ธฐํ™” ๊ณต์‹:wโˆผN(0,2Nin)w \sim N(0, \frac{2}{N_{in}})wโˆผN(0,Ninโ€‹2โ€‹)
    • NinN_{in}Ninโ€‹: ์ž…๋ ฅ ๋…ธ๋“œ ์ˆ˜

(์ฐธ๊ณ ) ReLU ๋ฐ Leaky ReLU ํ™œ์„ฑํ™” ํ•จ์ˆ˜

(1) ReLU: ๋น„๋Œ€์นญ์ ์ธ ํ™œ์„ฑํ™” ํ•จ์ˆ˜

  • ReLU๋Š” ์Œ์ˆ˜๋ฅผ ๋ชจ๋‘ 0์œผ๋กœ ๋งŒ๋“œ๋Š” ํŠน์„ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

ReLU(x)=maxโก(0,x)\text{ReLU}(x) = \max(0, x)ReLU(x)=max(0,x)

์ฆ‰,

  • ์ž…๋ ฅ๊ฐ’์ด ์–‘์ˆ˜์ผ ๊ฒฝ์šฐ: ๊ทธ๋Œ€๋กœ ์ „๋‹ฌ๋จ.
  • ์ž…๋ ฅ๊ฐ’์ด ์Œ์ˆ˜์ผ ๊ฒฝ์šฐ: 0์œผ๋กœ ๋ณ€ํ•จ.

  1. ๋‰ด๋Ÿฐ ์ ˆ๋ฐ˜์ด ๋น„ํ™œ์„ฑํ™”๋จ (Dead Neurons ๋ฌธ์ œ)

    • ์Œ์ˆ˜ ์ž…๋ ฅ์„ ๋ฐ›๋Š” ๋‰ด๋Ÿฐ์€ ํ•ญ์ƒ 0์„ ์ถœ๋ ฅ โ†’ ํ•™์Šต ๊ณผ์ •์—์„œ ๋‰ด๋Ÿฐ์ด ์ฃฝ์–ด๋ฒ„๋ฆด ๊ฐ€๋Šฅ์„ฑ ๋†’์Œ.
  2. ์ถœ๋ ฅ๊ฐ’์ด ํ•œ์ชฝ์œผ๋กœ ํŽธํ–ฅ๋จ โ†’ ๋ถ„์‚ฐ์ด ๊ฐ์†Œํ•  ๊ฐ€๋Šฅ์„ฑ

    • Xavier ์ดˆ๊ธฐํ™”์ฒ˜๋Ÿผ 1Nin+Nout\frac{1}{N_{in} + N_{out}}Ninโ€‹+Noutโ€‹1โ€‹ ๋˜๋Š” 1Nin\frac{1}{N_{in}}Ninโ€‹1โ€‹์„ ์‚ฌ์šฉํ•˜๋ฉด ๋ถ„์‚ฐ์ด ๋„ˆ๋ฌด ์ž‘์•„์ ธ, ReLU๊ฐ€ ์ œ๋Œ€๋กœ ํ™œ์„ฑํ™”๋˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Œ.

(2) Leaky ReLU: ์™„ํ™”๋œ ํ˜•ํƒœ์ด์ง€๋งŒ ์—ฌ์ „ํžˆ ํŽธํ–ฅ ๋ฌธ์ œ ์กด์žฌ

  • Leaky ReLU๋Š” ReLU์˜ Dead Neurons ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์Œ์ˆ˜ ์ž…๋ ฅ์—๋„ ์ž‘์€ ๊ธฐ์šธ๊ธฐ ฮฑ\alphaฮฑ๋ฅผ ์ ์šฉํ•œ ๋ณ€ํ˜•์ž…๋‹ˆ๋‹ค.

Leakyย ReLU(x)={x,ifย xโ‰ฅ0ฮฑx,ifย x<0\text{Leaky ReLU}(x) = \begin{cases} x, & \text{if } x \geq 0 \ \alpha x, & \text{if } x < 0 \end{cases}Leakyย ReLU(x)={x,ฮฑx,โ€‹ifย xโ‰ฅ0ifย x<0โ€‹

  • ๊ทธ๋Ÿฌ๋‚˜ Leaky ReLU๋„ ์—ฌ์ „ํžˆ ์ถœ๋ ฅ๊ฐ’์ด ํ•œ์ชฝ์œผ๋กœ ํŽธํ–ฅ๋˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ ์ ˆํ•œ ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ’ก He (Kaiming) ์ดˆ๊ธฐํ™”์˜ ํƒ„์ƒ ๋น„ํ™”

(1) Xavier ์ดˆ๊ธฐํ™”์˜ ํ•œ๊ณ„์ 

  • Xavier ์ดˆ๊ธฐํ™”๋Š” ์‹œ๊ทธ๋ชจ์ด๋“œ ๋ฐ Tanh ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์œ„ํ•ด ์„ค๊ณ„๋˜์—ˆ์œผ๋ฉฐ, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ถ„์‚ฐ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

Var[w]=2Nin+Nout\text{Var}[w] = \frac{2}{N_{in} + N_{out}}Var[w]=Ninโ€‹+Noutโ€‹2โ€‹

  • ๊ทธ๋Ÿฌ๋‚˜ Xavier ์ดˆ๊ธฐํ™”๋Š” ReLU ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ๊ณ ๋ คํ•˜์ง€ ์•Š์•˜๊ธฐ ๋•Œ๋ฌธ์—, ReLU์—์„œ๋Š” ์‹ ๊ฒฝ๋ง์ด ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ํ•™์Šต๋˜์ง€ ์•Š์„ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

    • Xavier ์ดˆ๊ธฐํ™”๋Š” ์ถœ๋ ฅ๊ฐ’์ด ํ‰๊ท  0์„ ์œ ์ง€ํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์œผ๋‚˜,
    • ReLU๋Š” ์ถœ๋ ฅ๊ฐ’์ด ํ•ญ์ƒ 0 ์ด์ƒ์ด๋ฏ€๋กœ ํ‰๊ท ์ด 0๋ณด๋‹ค ํผ โ†’ Xavier ์ดˆ๊ธฐํ™”๋ฅผ ์ ์šฉํ•˜๋ฉด ์ถœ๋ ฅ๊ฐ’์ด ๋„ˆ๋ฌด ์ž‘์•„์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Œ.
  • ์ฆ‰, Xavier ์ดˆ๊ธฐํ™”๋Š” ReLU์—์„œ ์ถœ๋ ฅ๊ฐ’์˜ ํฌ๊ธฐ๊ฐ€ ๊ธ‰๊ฒฉํžˆ ์ž‘์•„์งˆ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Œ.

(2) He ์ดˆ๊ธฐํ™”์—์„œ 2๋ฅผ ๊ณฑํ•˜๋Š” ์ด์œ 

  • He ์ดˆ๊ธฐํ™”๋Š” ReLU ๊ณ„์—ด ํ™œ์„ฑํ™” ํ•จ์ˆ˜์˜ ๋น„๋Œ€์นญ์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ ๊ฐ€์ค‘์น˜์˜ ๋ถ„์‚ฐ์„ ๋” ํฌ๊ฒŒ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

Var[w]=2Nin\text{Var}[w] = \frac{2}{N_{in}}Var[w]=Ninโ€‹2โ€‹

  • ์ฆ‰, LeCun ์ดˆ๊ธฐํ™”๋ณด๋‹ค 2๋ฐฐ ํฐ ๋ถ„์‚ฐ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๐Ÿค” Q. ์™œ LeCun ์ดˆ๊ธฐํ™”๋ณด๋‹ค 2๋ฐฐ ํฐ ๋ถ„์‚ฐ์„ ์‚ฌ์šฉํ•˜๋Š”๊ฐ€?

  1. ReLU๋Š” ์Œ์ˆ˜ ์ถœ๋ ฅ์„ 0์œผ๋กœ ๋งŒ๋“ค๊ธฐ ๋•Œ๋ฌธ์— ํ‰๊ท ์ด ์ค„์–ด๋“ฆ

    • Xavier ์ดˆ๊ธฐํ™”์—์„œ๋Š” ํ‰๊ท ์ด 0์„ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ง€๋งŒ,
    • ReLU๋Š” ์Œ์ˆ˜๋ฅผ ๋ชจ๋‘ 0์œผ๋กœ ๋ณด๋‚ด๋ฏ€๋กœ ์ถœ๋ ฅ ํ‰๊ท ์ด 0๋ณด๋‹ค ์ปค์ง€๊ฒŒ ๋จ.
    • ๋”ฐ๋ผ์„œ ๋ถ„์‚ฐ์ด ๋„ˆ๋ฌด ์ž‘์•„์งˆ ์ˆ˜ ์žˆ์Œ.
  2. ์ถœ๋ ฅ๊ฐ’์˜ ๋ถ„์‚ฐ ๊ฐ์†Œ๋ฅผ ๋ณด์ •ํ•˜๊ธฐ ์œ„ํ•ด 2๋ฐฐ ์ฆ๊ฐ€

    • ReLU๋Š” ์ž…๋ ฅ์˜ ์•ฝ ์ ˆ๋ฐ˜๋งŒ ํ™œ์„ฑํ™”๋จ โ†’ ํ™œ์„ฑํ™”๋œ ๋‰ด๋Ÿฐ๋“ค๋งŒ ๊ณ ๋ คํ•˜๋ฉด ํ‰๊ท ์ด ๋‚ฎ์•„์ง.
    • ์ด๋ฅผ ๋ณด์ •ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ€์ค‘์น˜์˜ ๋ถ„์‚ฐ์„ 2๋ฐฐ ์ฆ๊ฐ€์‹œ์ผœ ํ•™์Šต์„ ์›ํ™œํ•˜๊ฒŒ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•จ.

=> ์ฆ‰, He ์ดˆ๊ธฐํ™”๋Š” ReLU ๋‰ด๋Ÿฐ์ด ๋น„ํ™œ์„ฑํ™”๋˜๋Š” ํŠน์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ ์ ์ ˆํ•œ ๋ถ„์‚ฐ์„ ์กฐ์ ˆํ•œ ๋ฐฉ๋ฒ•์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์•„๋ž˜๋Š” ํŒŒ์ดํ† ์น˜์™€ ์ผ€๋ผ์Šค์— ์ ์šฉํ•œ ์˜ˆ์‹œ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

๐Ÿ“Œ ์ ์šฉ ์˜ˆ์‹œ (PyTorch)

1
2
3
4
5
6
def init_weights(m):
    if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
        init.kaiming_uniform_(m.weight, nonlinearity='relu')
        init.zeros_(m.bias)

model.apply(init_weights)

๐Ÿ“Œ ์ ์šฉ ์˜ˆ์‹œ (Keras)

1
2
3
4
from tensorflow.keras.layers import Dense
from tensorflow.keras.initializers import HeNormal

layer = Dense(64, activation='relu', kernel_initializer=HeNormal())

์ •๋ฆฌ

๊ฐ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์ด ์–ด๋–ป๊ฒŒ ๋ถ„์‚ฐ์„ ์กฐ์ ˆํ•˜๋Š”์ง€ ์ •๋ฆฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ดˆ๊ธฐํ™” ๊ธฐ๋ฒ• ๊ฐ€์ค‘์น˜ ๋ถ„ํฌ ์‚ฌ์šฉ๋˜๋Š” ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ํŠน์ง•
LeCun ์ดˆ๊ธฐํ™” wโˆผN(0,1Nin)w \sim \mathcal{N}(0, \frac{1}{N_{in}})wโˆผN(0,Ninโ€‹1โ€‹) Sigmoid, Tanh Sigmoid ํฌํ™” ๋ฌธ์ œ ๋ฐฉ์ง€
Xavier (Glorot) ์ดˆ๊ธฐํ™” wโˆผN(0,2Nin+Nout)w \sim \mathcal{N}(0, \frac{2}{N_{in} + N_{out}})wโˆผN(0,Ninโ€‹+Noutโ€‹2โ€‹) Sigmoid, Tanh ์ˆœ์ „ํŒŒ ๋ฐ ์—ญ์ „ํŒŒ์—์„œ ๋ถ„์‚ฐ ์œ ์ง€
He (Kaiming) ์ดˆ๊ธฐํ™” wโˆผN(0,2Nin)w \sim \mathcal{N}(0, \frac{2}{N_{in}})wโˆผN(0,Ninโ€‹2โ€‹) ReLU, Leaky ReLU ReLU ํ™œ์„ฑํ™” ๋‰ด๋Ÿฐ ๋น„์œจ ๋ณด์ •

๊ฐ ์ดˆ๊ธฐํ™” ๊ธฐ๋ฒ•์€ ์ˆœ์ „ํŒŒ์™€ ์—ญ์ „ํŒŒ์—์„œ ๋ถ„์‚ฐ์ด ๋„ˆ๋ฌด ์ปค์ง€๊ฑฐ๋‚˜ ์ž‘์•„์ง€๋Š” ๋ฌธ์ œ๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.


  1. ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ• ์„ ํƒ ๊ธฐ์ค€

์ดˆ๊ธฐํ™” ๋ฐฉ์‹์— ๋”ฐ๋ผ ํ•™์Šต ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๋”ฐ๋ผ์„œ ์‚ฌ์šฉํ•˜๋ ค๋Š” ํ™œ์„ฑํ™” ํ•จ์ˆ˜์™€ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ์— ๋งž์ถฐ ์ ์ ˆํ•œ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์„ ์„ ํƒํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ™Œ ์•„๋ž˜๋Š” ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ• ์„ ํƒ ์‹œ ๊ณ ๋ คํ•ด์•ผ ํ•  ํŒ์ž…๋‹ˆ๋‹ค:

  1. ํ™œ์„ฑํ™” ํ•จ์ˆ˜์™€์˜ ์กฐํ•ฉ ๊ณ ๋ ค:

    • ํ™œ์„ฑํ™” ํ•จ์ˆ˜์— ๋”ฐ๋ผ ์ตœ์ ์˜ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์ด ๋‹ค๋ฆ…๋‹ˆ๋‹ค.
    • ์˜ˆ๋ฅผ ๋“ค์–ด, ReLU ๊ณ„์—ด ํ•จ์ˆ˜์—๋Š” He ์ดˆ๊ธฐํ™”๊ฐ€, Sigmoid๋‚˜ Tanh ํ•จ์ˆ˜์—๋Š” Xavier ์ดˆ๊ธฐํ™”๊ฐ€ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.
  2. ๋„คํŠธ์›Œํฌ ๊นŠ์ด์™€ ๊ตฌ์กฐ์— ๋”ฐ๋ฅธ ์กฐ์ •:

    • ๋„คํŠธ์›Œํฌ๊ฐ€ ๊นŠ์–ด์งˆ์ˆ˜๋ก ๊ทธ๋ž˜๋””์–ธํŠธ ์†Œ์‹ค์ด๋‚˜ ํญ๋ฐœ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • ์ด๋ฅผ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ ์ธต์˜ ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”๋ฅผ ์‹ ์ค‘ํ•˜๊ฒŒ ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  3. ๋ฐฐ์น˜ ์ •๊ทœํ™”์™€์˜ ๋ณ‘ํ–‰ ์‚ฌ์šฉ:

    • ๋ฐฐ์น˜ ์ •๊ทœํ™”(Batch Normalization)๋Š” ๊ฐ ์ธต์˜ ์ž…๋ ฅ ๋ถ„ํฌ๋ฅผ ์ •๊ทœํ™”ํ•˜์—ฌ ํ•™์Šต์„ ์•ˆ์ •ํ™”์‹œํ‚ต๋‹ˆ๋‹ค.
    • ์ดˆ๊ธฐํ™”์™€ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋ฉด ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  4. ๋“œ๋กญ์•„์›ƒ๊ณผ์˜ ์ƒํ˜ธ์ž‘์šฉ:

    • ๋“œ๋กญ์•„์›ƒ(Dropout)์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ, ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”๊ฐ€ ๋“œ๋กญ์•„์›ƒ์˜ ํšจ๊ณผ์— ์˜ํ–ฅ์„ ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • ์ด๋•Œ๋Š” ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์„ ์ ์ ˆํžˆ ์„ ํƒํ•˜์—ฌ ๋“œ๋กญ์•„์›ƒ๊ณผ์˜ ์‹œ๋„ˆ์ง€๋ฅผ ๊ทน๋Œ€ํ™”ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  5. ํ•™์Šต๋ฅ ๊ณผ ์ดˆ๊ธฐํ™”์˜ ์ƒ๊ด€๊ด€๊ณ„:

    • ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”์— ๋”ฐ๋ผ ์ตœ์ ์˜ ํ•™์Šต๋ฅ ์ด ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์„ ์„ ํƒํ•œ ํ›„, ์ด์— ๋งž๋Š” ํ•™์Šต๋ฅ ์„ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.
  6. ์ •๊ทœํ™” ๊ธฐ๋ฒ•๊ณผ์˜ ์กฐํ•ฉ:

    • L1, L2 ์ •๊ทœํ™”์™€ ๊ฐ™์€ ๊ธฐ๋ฒ•๊ณผ ์ดˆ๊ธฐํ™”๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜์—ฌ ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๊ณ  ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  7. ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์˜ ์‹คํ—˜์  ๊ฒ€์ฆ:

    • ๋ชจ๋ธ๊ณผ ๋ฐ์ดํ„ฐ์…‹์— ๋”ฐ๋ผ ์ตœ์ ์˜ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์ด ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, ์—ฌ๋Ÿฌ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์„ ์‹คํ—˜ํ•˜์—ฌ ๊ฐ€์žฅ ์ ํ•ฉํ•œ ๋ฐฉ์‹์„ ์„ ํƒํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ํŒ๋“ค์„ ๊ณ ๋ คํ•˜์—ฌ ์›จ์ดํŠธ ์ดˆ๊ธฐํ™”๋ฅผ ์„ค์ •ํ•˜๋ฉด ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ํ•™์Šต ํšจ์œจ์„ฑ๊ณผ ์„ฑ๋Šฅ์„ ๋”์šฑ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

โ“ (์ฐธ๊ณ ) ์›จ์ดํŠธ๋ฅผ 0 ๋˜๋Š” 1๋กœ ์„ค์ •ํ•˜๋ฉด ์–ด๋–ค ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‚˜์š”?

  • ์›จ์ดํŠธ๋ฅผ 0 ๋˜๋Š” 1๋กœ ์ดˆ๊ธฐํ™”ํ•˜๋ฉด ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • 0์œผ๋กœ ์„ค์ •ํ•  ๊ฒฝ์šฐ: ๋ชจ๋“  ๋‰ด๋Ÿฐ์ด ๋™์ผํ•œ ์ถœ๋ ฅ๊ณผ ๋™์ผํ•œ ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ ๊ฐ€์ง€๊ฒŒ ๋˜๋ฉฐ, ํ•™์Šต์ด ์ง„ํ–‰๋˜์ง€ ์•Š์Œ.
    • 1๋กœ ์„ค์ •ํ•  ๊ฒฝ์šฐ: ๊ทธ๋ž˜๋””์–ธํŠธ๊ฐ€ ๋„ˆ๋ฌด ํฌ๊ฑฐ๋‚˜ ๋„ˆ๋ฌด ์ž‘์•„์งˆ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋„คํŠธ์›Œํฌ๊ฐ€ ์ œ๋Œ€๋กœ ํ•™์Šต๋˜์ง€ ์•Š์Œ.
    • ๋žœ๋ค ์ดˆ๊ธฐํ™”์˜ ํ•„์š”์„ฑ: ๋‰ด๋Ÿฐ๋“ค์ด ๊ฐ๊ธฐ ๋‹ค๋ฅธ ํŠน์ง•์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ฐ€์ค‘์น˜๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์ดˆ๊ธฐํ™”ํ•ด์•ผ ํ•จ.

โ“(์ฐธ๊ณ ) ๊ทธ๋Ÿฌ๋ฉด ์ผ๋ฐ˜ ์ •๊ทœ๋ถ„ํฌ๋Š” ์ „ํ˜€ ์•ˆ ์“ฐ๋‚˜์š”?

  • ํŒŒ์ดํ† ์น˜์—์„œ torch.nn.init.normal_ ํ•จ์ˆ˜๋Š” ํ…์„œ์˜ ๊ฐ’์„ ์ •๊ทœ ๋ถ„ํฌ(Normal Distribution)๋ฅผ ๋”ฐ๋ฅด๋„๋ก ์ดˆ๊ธฐํ™”ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
    • ์ด ํ•จ์ˆ˜๋Š” ํ‰๊ท (mean)๊ณผ ํ‘œ์ค€ํŽธ์ฐจ(std)๋ฅผ ์ง€์ •ํ•˜์—ฌ, ํ•ด๋‹น ๋ถ„ํฌ์— ๋”ฐ๋ผ ํ…์„œ์˜ ์š”์†Œ๋“ค์„ ๋ฌด์ž‘์œ„๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
  • ์ผ๋ฐ˜์ ์œผ๋กœ, ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ์‹œ์—๋Š” ํ™œ์„ฑํ™” ํ•จ์ˆ˜์™€ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ์— ์ตœ์ ํ™”๋œ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์ด ๊ถŒ์žฅ๋ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ๋ฅผ ๋“ค์–ด, ReLU ํ™œ์„ฑํ™” ํ•จ์ˆ˜์—๋Š” He ์ดˆ๊ธฐํ™”, Sigmoid๋‚˜ Tanh ํ•จ์ˆ˜์—๋Š” Xavier ์ดˆ๊ธฐํ™”๊ฐ€ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•๋“ค์€ ๊ฐ ์ธต์˜ ์ž…๋ ฅ ๋ฐ ์ถœ๋ ฅ ๋…ธ๋“œ ์ˆ˜๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ๊ฐ€์ค‘์น˜๋ฅผ ์ดˆ๊ธฐํ™”ํ•จ์œผ๋กœ์จ, ํ•™์Šต์˜ ์•ˆ์ •์„ฑ๊ณผ ์ˆ˜๋ ด ์†๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.
  • ๋ฐ˜๋ฉด์—, torch.nn.init.normal_๊ณผ ๊ฐ™์€ ์ผ๋ฐ˜์ ์ธ ์ •๊ทœ ๋ถ„ํฌ ์ดˆ๊ธฐํ™”๋Š” ์ด๋Ÿฌํ•œ ์ตœ์ ํ™”๋œ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์— ๋น„ํ•ด ํ•™์Šต ํšจ์œจ์ด ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ํŠน์ •ํ•œ ์ด์œ ๋‚˜ ์‹คํ—˜์ ์ธ ๋ชฉ์ ์ด ์—†๋Š” ํ•œ, ์ผ๋ฐ˜์ ์ธ ์ •๊ทœ ๋ถ„ํฌ ์ดˆ๊ธฐํ™”๋Š” ๋งŽ์ด ์‚ฌ์šฉ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋‚˜, ํŠน์ •ํ•œ ๋ชจ๋ธ์ด๋‚˜ ์‹คํ—˜์—์„œ๋Š” torch.nn.init.normal_์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • ์˜ˆ๋ฅผ ๋“ค์–ด, GAN(Generative Adversarial Network) ๋ชจ๋ธ์˜ ๊ตฌํ˜„์—์„œ ๊ฐ€์ค‘์น˜๋ฅผ ์ดˆ๊ธฐํ™”ํ•  ๋•Œ, ํ‰๊ท ์ด 0์ด๊ณ  ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ 0.02์ธ ์ •๊ทœ ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

  1. ๊ฒฐ๋ก 

์›จ์ดํŠธ ์ดˆ๊ธฐํ™”๋Š” ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ์ฒซ๊ฑธ์Œ์„ ๊ฒฐ์ •์ง“๋Š” ์ค‘์š”ํ•œ ์š”์†Œ์ž…๋‹ˆ๋‹ค.

  • ์ ์ ˆํ•œ ์ดˆ๊ธฐํ™”๋ฅผ ์„ ํƒํ•˜๋ฉด ํ•™์Šต์ด ์•ˆ์ •์ ์œผ๋กœ ์ง„ํ–‰๋˜๊ณ , ๊ทธ๋ž˜๋””์–ธํŠธ ์†Œ์‹ค์ด๋‚˜ ํญ๋ฐœ ๊ฐ™์€ ๋ฌธ์ œ๋„ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์‰ฝ๊ฒŒ ์ •๋ฆฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • ReLU ๊ณ„์—ด โ†’ He ์ดˆ๊ธฐํ™”
  • Sigmoid, Tanh โ†’ Xavier ์ดˆ๊ธฐํ™”
  • SELU โ†’ LeCun ์ดˆ๊ธฐํ™”

๋ชจ๋ธ์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ์ œ๋Œ€๋กœ ์„ธํŒ…ํ•˜๋ฉด ํ•™์Šต ์†๋„๋„ ๋นจ๋ผ์ง€๊ณ , ์›ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋” ์‰ฝ๊ฒŒ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ฝ์–ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค! ๐Ÿ˜Šโœจ



-->