[๋”ฅ๋Ÿฌ๋‹][๊ฐœ๋…์ •๋ฆฌ] Inductive Bias๋ž€?

Posted by Euisuk's Dev Log on June 6, 2021

[๋”ฅ๋Ÿฌ๋‹][๊ฐœ๋…์ •๋ฆฌ] Inductive Bias๋ž€?

Inductive Bias

Overview

Inductive Bias๋ž€ ๋ฌด์—‡์ผ๊นŒ์š”? ์ตœ๊ทผ ๋…ผ๋ฌธ๋“ค์„ ๋ณด๋ฉด ๊ทธ๋ƒฅ Bias๋„ ์•„๋‹ˆ๊ณ  Inductive Bias๋ผ๋Š” ๋ง์ด ์ž์ฃผ ๋‚˜์˜ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ์š”! ์˜ค๋Š˜์€ ํ•ด๋‹น ๊ฐœ๋…์— ๋Œ€ํ•ด ์ •๋ฆฌํ•ด๋ณด๋Š” ์‹œ๊ฐ„์„ ๊ฐ€์ง€๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๊ณต๋ถ€ํ•˜๋ฉด์„œ ์ž‘์„ฑํ•œ ๊ธ€์ด๋ฏ€๋กœ ํ‹€๋ฆฐ ๊ฐœ๋…์ด ์žˆ๋‹ค๋ฉด ์–ธ์ œ๋“  ํ”ผ๋“œ๋ฐฑ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค.

Bias(ํŽธํ–ฅ) and Variance(๋ถ„์‚ฐ)

์ด๋ฏธ ํ•ด๋‹น ํฌ์ŠคํŒ…์„ ์ฝ๋Š” ๋ถ„๋“ค์€ ์•„์‹œ๊ฒ ์ง€๋งŒ, ๋จผ์ € ๊ฐ„๋‹จํ•˜๊ฒŒ Bias์™€ Variance์˜ ๊ฐœ๋…์„ ์งš๊ณ  ๋„˜์–ด๊ฐ€๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

๋จผ์ € Error๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด Bias์™€ Variance๋กœ ๋ถ„ํ•ด๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Error Definition

์œ„ ์ˆ˜์‹์„ ํ†ตํ•ด ์•Œ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • Bias(์˜ ์ œ๊ณฑ)์€ ์‹ค์ œ๊ฐ’(F^โˆ—(X0)\widehat{F}^{*}(X_{0})Fโˆ—(X0โ€‹))๊ณผ ์˜ˆ์ธก๊ฐ’๋“ค์˜ ํ‰๊ท (Fโ€พโˆ—(X0)\overline{F}^{*}(X_{0})Fโˆ—(X0โ€‹))์˜ ์ฐจ์ด๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๋‚ฎ์€ Bias๋Š” ํ‰๊ท ์ ์œผ๋กœ ์šฐ๋ฆฌ๊ฐ€ ์‹ค์ œ๊ฐ’์— ๊ทผ์‚ฌํ•˜๊ฒŒ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•˜๋ฉฐ, ๋†’์€ Bias๋Š” ์‹ค์ œ๊ฐ’๊ณผ ์˜ˆ์ธก๊ฐ’์˜ ํ‰๊ท ์ด ๋ฉ€๋ฆฌ ๋–จ์–ด์ ธ์žˆ์Œ์œผ๋กœ poor match๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, Bias๋Š” ๋ชจ๋ธ์˜ ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’์ด ์–ผ๋งˆ๋‚˜ ๋–จ์–ด์ ธ์žˆ๋Š”๊ฐ€๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ, Bias๊ฐ€ ํฌ๊ฒŒ ๋˜๋ฉด ๊ณผ์†Œ์ ํ•ฉ(underfitting)์„ ์•ผ๊ธฐํ•ฉ๋‹ˆ๋‹ค.
  • Variance๋Š” ์˜ˆ์ธก๊ฐ’๋“ค์˜ ํ‰๊ท (Fโ€พโˆ—(X0)\overline{F}^{*}(X_{0})Fโˆ—(X0โ€‹))์œผ๋กœ๋ถ€ํ„ฐ ํŠน์ • ์˜ˆ์ธก๊ฐ’๋“ค(F^(X0)\widehat{F}(X_{0})F(X0โ€‹))์ด ์–ด๋А ์ •๋„ ํผ์ ธ์žˆ๋Š” ๊ฐ€๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๋‚ฎ์€ Variance๋Š” ๋“ค์–ด์˜ค๋Š” ๋ฐ์ดํ„ฐ์— ๋”ฐ๋ผ ์˜ˆ์ธก๊ฐ’์ด ํฌ๊ฒŒ ๋ฐ”๋€Œ์ง€ ์•Š๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•˜๋ฉฐ, ๋†’์€ Variance๋Š” ๋“ค์–ด์˜ค๋Š” ๋ฐ์ดํ„ฐ์— ๋”ฐ๋ผ ์˜ˆ์ธก๊ฐ’์ด ํฌ๊ฒŒ ๋ฐ”๋€Œ๋ฏ€๋กœ poor match๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, Variance๋Š” ์˜ˆ์ธก ๋ชจ๋ธ์˜ ๋ณต์žก๋„๋ผ๊ณ  ํ•ด์„ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, Variance๊ฐ€ ํฐ ๋ชจ๋ธ์€ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ์ง€๋‚˜์น˜๊ฒŒ ์ ํ•ฉ์„ ์‹œ์ผœ ์ผ๋ฐ˜ํ™”๋˜์ง€ ์•Š๋Š” ๊ณผ๋Œ€์ ํ•ฉ(overfitting)์„ ์•ผ๊ธฐํ•ฉ๋‹ˆ๋‹ค.

Bias_Variance

์œ„ ๊ทธ๋ฆผ์€ Bias์™€ Variance๋ฅผ ๊ฐ€์žฅ ์ž˜ ์„ค๋ช…ํ•˜๋Š” ๊ทธ๋ฆผ(๊ฐœ์ธ์ ์ธ ์ƒ๊ฐ)์œผ๋กœ ๊ฐ€์šด๋ฐ ์œ„์น˜ํ•œ ์ฃผํ™ฉ์ƒ‰ ์›์ด ์šฐ๋ฆฌ๊ฐ€ ์‹ค์ œ๋กœ ์˜ˆ์ธกํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฐ’(target)์ด๊ณ , ํŒŒ๋ž€์ƒ‰ X๋“ค์ด ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ(output)๋ผ๊ณ  ์ƒ๊ฐํ•˜์‹œ๋ฉด๋ฉ๋‹ˆ๋‹ค. Bias๊ฐ€ ๋‚ฎ์„์ˆ˜๋ก ์‹ค์ œ ์ •๋‹ต๊ฐ’์— ์˜ˆ์ธก๊ฐ’๋“ค์ด ๊ฐ€๊น๊ฒŒ ์กด์žฌํ•˜๊ณ , Variance๊ฐ€ ๋‚ฎ์„์ˆ˜๋ก ์˜ˆ์ธก๊ฐ’๋“ค์˜ ํผ์ง„ ์ •๋„๊ฐ€ ์ค„์–ด๋“œ๋Š” ๊ฒƒ์„ ํ•œ ๋ˆˆ์— ๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ, ํ˜„์‹ค์ ์œผ๋กœ ๋‚ฎ์€ Bias์™€ ๋‚ฎ์€ Variance ๋‘ ๊ฐ€์ง€๋ฅผ ๋™์‹œ์— ๋งŒ์กฑํ•˜๋Š” ๊ฒƒ์€ ๊ฑฐ์˜ ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์–ด๋А ์ •๋„์˜ tradeoff๋Š” ๋ฐ˜๋“œ์‹œ ์ƒ๊ธธ ์ˆ˜ ๋ฐ–์— ์—†์œผ๋ฉฐ ์ด๋Š” bias-variance trade-off ๋ผ๊ณ  ๋ถˆ๋ฆฝ๋‹ˆ๋‹ค..

Inductive Bias (๊ท€๋‚ฉํŽธํ–ฅ)

๊ทธ๋ ‡๋‹ค๋ฉด ์ด๋ฒˆ ํฌ์ŠคํŒ…์˜ ๋ฉ”์ธ ๋””์‰ฌ์ธ Inductive Bias๋Š” ๋ฌด์—‡์ผ๊นŒ์š”?

์ผ๋ฐ˜์ ์œผ๋กœ ๋ชจ๋ธ์ด ๊ฐ–๋Š” generalization problem์œผ๋กœ๋Š” ๋ชจ๋ธ์ด brittle(๋ถˆ์•ˆ์ •)ํ•˜๋‹ค๋Š” ๊ฒƒ๊ณผ, spurious(๊ฒ‰์œผ๋กœ๋งŒ ๊ทธ๋Ÿด์‹ธํ•œ)ํ•˜๋‹ค๋Š” ๊ฒƒ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • Models are brittle : ๋ฐ์ดํ„ฐ์˜ input์ด ์กฐ๊ธˆ๋งŒ ๋ฐ”๋€Œ์–ด๋„ ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ๊ฐ€ ๋ง๊ฐ€์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  • Models are spurious : ๋ฐ์ดํ„ฐ ๋ณธ์—ฐ์˜ ์˜๋ฏธ๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ ๊ฒฐ๊ณผ(artifacts)์™€ ํŽธํ–ฅ(biases)๋ฅผ ํ•™์Šตํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Inductive Bias๋ฅผ ์ด์šฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. Wikipedia์—์„œ ์ •์˜๋ฅผ ๋นŒ๋ ค์˜ค์ž๋ฉด, Inductive bias๋ž€, ํ•™์Šต ์‹œ์—๋Š” ๋งŒ๋‚˜๋ณด์ง€ ์•Š์•˜๋˜ ์ƒํ™ฉ์— ๋Œ€ํ•˜์—ฌ ์ •ํ™•ํ•œ ์˜ˆ์ธก์„ ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•˜๋Š” ์ถ”๊ฐ€์ ์ธ ๊ฐ€์ • (additional assumptions)์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

Machine learning์—์„œ๋Š” ์–ด๋–ค ๋ชฉํ‘œ(target)๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ตฌ์ถ•์„ ๋ชฉํ‘œ๋กœ ํ•˜๊ณ , ์ด๋ฅผ ์œ„ํ•ด ์ œํ•œ๋œ ์ˆ˜์˜ ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ๊ฐ€ ์ฃผ์–ด์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ๋„˜์–ด ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋„ ์ผ๋ฐ˜ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์„ ๊ฐ€์ง„ ๋ชจ๋“  ๊ธฐ๊ณ„ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์—๋Š” ์–ด๋–ค ์œ ํ˜•์˜ Inductive Bias๊ฐ€ ์กด์žฌํ•˜๋Š”๋ฐ, ์ด๋Š” ๋ชจ๋ธ์ด ๋ชฉํ‘œ ํ•จ์ˆ˜๋ฅผ ํ•™์Šตํ•˜๊ณ  ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ๋„˜์–ด ์ผ๋ฐ˜ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋งŒ๋“  ๊ฐ€์ •์ž…๋‹ˆ๋‹ค.

๊ฐ‘์ž๊ธฐ ๊ฐ€์ •์ด๋ผ๋Š” ๋ง์ด ๋‚˜์™€์„œ ํ—ท๊ฐˆ๋ฆฌ๋‹ค๊ณ ์š”? ํŽธํ–ฅ์˜ ๊ด€์ ์œผ๋กœ ๋‹ค์‹œ ์ •์˜๋ฅผ ๋‚ด๋ ค๋ณด์ž๋ฉด ๋ชจ๋ธ์ด ํ•™์Šต๊ณผ์ •์—์„œ ๋ณธ ์ ์ด ์—†๋Š” ๋ถ„ํฌ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅ ๋ฐ›์•˜์„ ๋•Œ, ํ•ด๋‹น ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ํŒ๋‹จ์„ ๋‚ด๋ฆฌ๊ธฐ ์œ„ํ•ด ๊ฐ€์ง€๊ณ  ์žˆ๋Š”, ํ•™์Šต๊ณผ์ •์—์„œ ์Šต๋“๋œ Bias(ํŽธํ–ฅ)์ด๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๋”ฑ๋”ฑํ•œ ๊ฐœ๋…์ด ์•„๋‹Œ ์กฐ๊ธˆ ๋” ๋น„์œ ์ ์ธ ํ‘œํ˜„์„ ๊ฐ€์ง€๊ณ  ์˜ˆ์‹œ๋ฅผ ๋“ค์–ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ํ”ํžˆ ๋งํ•˜๋Š” ๋จธ์‹ ๋Ÿฌ๋‹/๋”ฅ๋Ÿฌ๋‹์„ input๊ณผ output์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์ฃผ์–ด์ง€๋ฉด, ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์— ๋งž๋Š” ํ•จ์ˆ˜๋ฅผ ๊ฐ€๋ฐฉ์—์„œ ์ฐพ๋Š” ๊ฒƒ์ด๋ผ๊ณ  ๋น„์œ ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. Inductive Bias๋Š” ์šฐ๋ฆฌ๊ฐ€ ํ•จ์ˆ˜๋ฅผ ์ฐพ๋Š” ๊ฐ€๋ฐฉ์˜ ํฌ๊ธฐ์— ๋ฐ˜๋น„๋ก€(๊ฐ€์ •์˜ ๊ฐ•๋„์™€๋Š” ๋น„๋ก€)๋˜๋Š” ๊ฐœ๋…์œผ๋กœ ๋ณด์‹œ๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ๊ฑฐ์˜ ๋ชจ๋“  ํ•จ์ˆ˜๋ฅผ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” MLP(Multi-Linear Perceptron)์˜ ๊ฒฝ์šฐ ์—„์ฒญ ํฐ ๊ฐ€๋ฐฉ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด๋˜๊ณ , CNN(Convolutional Neural-Net)์˜ ๊ฒฝ์šฐ ์ „์ž๋ณด๋‹ค๋Š” ์ž‘์€ ๊ฐ€๋ฐฉ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋ช‡ ๊ฐ€์ง€ Inductive Bias๋ฅผ ์˜ˆ๋กœ ๋“ค์–ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • Translation invariance : ์–ด๋– ํ•œ ์‚ฌ๋ฌผ์ด ๋“ค์–ด ์žˆ๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ œ๊ณตํ•ด์ค„ ๋•Œ ์‚ฌ๋ฌผ์˜ ์œ„์น˜๊ฐ€ ๋ฐ”๋€Œ์–ด๋„ ํ•ด๋‹น ์‚ฌ๋ฌผ์„ ์ธ์‹ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Translation Equivariance : ์–ด๋– ํ•œ ์‚ฌ๋ฌผ์ด ๋“ค์–ด ์žˆ๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ œ๊ณตํ•ด์ค„ ๋•Œ ์‚ฌ๋ฌผ์˜ ์œ„์น˜๊ฐ€ ๋ฐ”๋€Œ๋ฉด CNN๊ณผ ๊ฐ™์€ ์—ฐ์‚ฐ์˜ activation ์œ„์น˜ ๋˜ํ•œ ๋ฐ”๋€Œ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  • Maximum conditional independence : ๊ฐ€์„ค์ด ๋ฒ ์ด์ง€์•ˆ ํ”„๋ ˆ์ž„์›Œํฌ์— ์บ์ŠคํŒ…๋  ์ˆ˜ ์žˆ๋‹ค๋ฉด ์กฐ๊ฑด๋ถ€ ๋…๋ฆฝ์„ฑ์„ ๊ทน๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค.
  • Minimum cross-validation error : ๊ฐ€์„ค ์ค‘์—์„œ ์„ ํƒํ•˜๋ ค๊ณ  ํ•  ๋•Œ ๊ต์ฐจ ๊ฒ€์ฆ ์˜ค์ฐจ๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ์€ ๊ฐ€์„ค์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
  • Maximum margin : ๋‘ ํด๋ž˜์Šค ์‚ฌ์ด์— ๊ฒฝ๊ณ„๋ฅผ ๊ทธ๋ฆด ๋•Œ ๊ฒฝ๊ณ„ ๋„ˆ๋น„๋ฅผ ์ตœ๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค.
  • Minimum description length : ๊ฐ€์„ค์„ ๊ตฌ์„ฑํ•  ๋•Œ ๊ฐ€์„ค์˜ ์„ค๋ช… ๊ธธ์ด๋ฅผ ์ตœ์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋” ๊ฐ„๋‹จํ•œ ๊ฐ€์„ค์€ ๋” ์‚ฌ์‹ค์ผ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค๋Š” ๊ฐ€์ •์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
  • Minimum features: ํŠน์ • ํ”ผ์ณ๊ฐ€ ์œ ์šฉํ•˜๋‹ค๋Š” ๊ทผ๊ฑฐ๊ฐ€ ์—†๋Š” ํ•œ ์‚ญ์ œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • Nearest neighbors: ํŠน์ง• ๊ณต๊ฐ„์— ์žˆ๋Š” ์ž‘์€ ์ด์›ƒ์˜ ๊ฒฝ์šฐ ๋Œ€๋ถ€๋ถ„์ด ๋™์ผํ•œ ํด๋ž˜์Šค์— ์†ํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.

๋”ฅ๋Ÿฌ๋‹์—์„œ์˜ Inductive Bias

๋”ฅ๋Ÿฌ๋‹์˜ ๊ด€์ ์—์„œ Inductive Bias๋ฅผ ์ด์•ผ๊ธฐํ•ด๋ณด์ž๋ฉด, ๋”ฅ๋Ÿฌ๋‹์—์„œ ์šฐ๋ฆฌ๊ฐ€ ํ”ํžˆ ์Œ“๋Š” ๋ ˆ์ด์–ด์˜ ๊ตฌ์„ฑ์€ ์ผ์ข…์˜ Relational Inductive Bias(๊ด€๊ณ„ ๊ท€๋‚ฉ์  ํŽธํ–ฅ), ์ฆ‰ hierarchical processing(๊ณ„์ธต์  ์ฒ˜๋ฆฌ)๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋”ฅ๋Ÿฌ๋‹ ๋ ˆ์ด์–ด์˜ ์ข…๋ฅ˜์— ๋”ฐ๋ผ ์ถ”๊ฐ€์ ์ธ ๊ด€๊ณ„ ์œ ๋„ ํŽธํ–ฅ์„ ๋ถ€๊ณผ๋˜๋ฉฐ ์ด๋Š” ์•„๋ž˜ ํ‘œ๋ฅผ ์ฐธ๊ณ ํ•˜๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Relational inductive biases, deep learning, and graph networks (Battaglia et. al, 2018)

Conclusion

Inductive Bias๊ฐ€ ๊ฐ•ํ• ์ˆ˜๋ก, Sample Efficiency๊ฐ€ ์ข‹์•„์ง€๊ธด ํ•˜๋‚˜ ๊ทธ๋งŒํผ ๊ฐ€์ •์ด ๊ฐ•ํ•˜๊ฒŒ ๋“ค์–ด๊ฐ„ ๊ฒƒ์ž„์œผ๋กœ ์ข‹๊ฒŒ ๋ณผ ์ˆ˜๋งŒ์€ ์—†์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์•ž์—์„œ ์†Œ๊ฐœํ•œ bias-variance tradeoff์™€ ์œ ์‚ฌํ•œ ๊ฐœ๋…์œผ๋กœ ๋ณด์‹œ๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋งŽ์€ ํ˜„๋Œ€์˜ ๋”ฅ๋Ÿฌ๋‹ ๋ฐฉ๋ฒ•์€ ์ตœ์†Œํ•œ์˜ ์„ ํ–‰ ํ‘œํ˜„ ๋ฐ ๊ณ„์‚ฐ ๊ฐ€์ •์„ ๊ฐ•์กฐํ•˜๋Š” โ€œEnd-to-Endโ€ ์„ค๊ณ„ ์ฒ ํ•™์„ ๋”ฐ๋ฅด๋ฉฐ, ์ด๋Ÿฌํ•œ ํŠธ๋ Œ๋“œ๊ฐ€ ์™œ ๋Œ€๋ถ€๋ถ„์˜ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฒ•๋“ค์ด ๋ฐ์ดํ„ฐ ์ง‘์•ฝ์ (Data-Intensive)์ธ์ง€๋ฅผ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

๋ฐ˜๋ฉด์—, ๋ช‡๋ช‡ ๋”ฅ๋Ÿฌ๋‹ ์•„ํ‚คํ…์ฒ˜(Eg. Graph Network)์— ๋” ๊ฐ•ํ•œ ๊ด€๊ณ„ Inductive Bias๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์— ๋Œ€ํ•œ ๋งŽ์€ ์—ฐ๊ตฌ๋“ค์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

์ถœ์ฒ˜

๋ณธ ํฌ์ŠคํŒ…์€ ์•„๋ž˜ ์‚ฌ์ดํŠธ๋“ค์„ ์ฐธ๊ณ ํ•˜์—ฌ ์ž‘์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.