[๋จธ์‹ ๋Ÿฌ๋‹][์ฐจ์›์ถ•์†Œ] ๋ณ€์ˆ˜ ์ถ”์ถœ๋ฒ• - Principal Component Analysis (PCA)

Posted by Euisuk's Dev Log on September 21, 2021

[๋จธ์‹ ๋Ÿฌ๋‹][์ฐจ์›์ถ•์†Œ] ๋ณ€์ˆ˜ ์ถ”์ถœ๋ฒ• - Principal Component Analysis (PCA)

์›๋ณธ ๊ฒŒ์‹œ๊ธ€: https://velog.io/@euisuk-chung/๋จธ์‹ ๋Ÿฌ๋‹์ฐจ์›์ถ•์†Œ-๋ณ€์ˆ˜-์ถ”์ถœ๋ฒ•-Principal-Component-Analysis-PCA

๋ณธ ํฌ์ŠคํŠธ๋Š” ๊ณ ๋ ค๋Œ€ํ•™๊ต ๊ฐ•ํ•„์„ฑ ๊ต์ˆ˜๋‹˜์˜ ๊ฐ•์˜๋ฅผ ์ˆ˜๊ฐ• ํ›„ ์ •๋ฆฌ๋ฅผ ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ž‘์„ฑ ๋ฐ ์„ค๋ช…์˜ ํŽธ์˜๋ฅผ ์œ„ํ•ด ์•„๋ž˜๋Š” ํŽธํ•˜๊ฒŒ ์ž‘์„ฑํ•œ ์  ์–‘ํ•ด๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

Dimensionality Reduction

Supervised Variable Extraction

์ฐจ์›์ถ•์†Œ๋Š”, ๋ชจ๋ธ๋ง์„ ํ•˜๊ธฐ ์œ„ํ•ด ๋‚ด๊ฐ€ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์˜ ์ •๋ณด๋ฅผ ์ตœ๋Œ€ํ•œ ๋ณด์กดํ•˜๋ฉด์„œ, ํ›จ์”ฌ ๋” compactํ•˜๊ฒŒ ๋ฐ์ดํ„ฐ์…‹์„ ๊ตฌ์„ฑํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•˜๋ฉฐ, ํฌ๊ฒŒ ๋ณ€์ˆ˜์„ ํƒ(Variable Selection, ๋ณ€์ˆ˜๋“ค์˜ ๋ถ€๋ถ„ ์ง‘ํ•ฉ ์„ ํƒ)๊ณผ ๋ณ€์ˆ˜์ถ”์ถœ(Variable Extraction, ๋ณ€์ˆ˜๋“ค์„ ์š”์•ฝํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜ ์ƒ์„ฑ)์ด ์žˆ๋‹ค.

์ด์ „ ํฌ์ŠคํŠธ์—์„œ๋Š” ์•„๋ž˜ ํ‘œ์—์„œ Supervised Variable Selection์„ ์‚ดํŽด๋ณด์•˜๋‹ค. ์˜ค๋Š˜์€ Supervised Variable Extraction์— ๋Œ€ํ•ด ์ด์•ผ๊ธฐ๋ฅผ ํ’€์–ด๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค.

Supervised Variable Selection

Principal Component Analysis

Principal Component Analysis (PCA)๋Š” ๋ณ€์ˆ˜์ถ”์ถœ์˜ ๋Œ€ํ‘œ์ ์ธ ๊ธฐ๋ฒ•์œผ๋กœ, ์ฃผ์„ฑ๋ถ„๋ถ„์„์œผ๋กœ๋„ ๋ถˆ๋ฆฌ๋ฉฐ, original data์˜ ๋ถ„์‚ฐ์„ ์ตœ๋Œ€ํ•œ ๋ณด์กด(preserve the variance)ํ•˜๋Š” ์ง๊ต ๊ธฐ์ €(orthogonal basis)๋ฅผ ์ฐพ๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

PCA

Q. ๋‹ค์Œ ๋ฐ‘์— ๋‘ ๊ฐ€์ง€์˜ ๊ทธ๋ฆผ์€ ์œ„์˜ ๊ทธ๋ฆผ์„ ์‚ฌ์˜์‹œํ‚จ ๊ธฐ์ €๋“ค์ด๋‹ค. ์ด๋•Œ, ๊ธฐ์ €๋Š” ๊ฐ๊ฐ์˜ ๋ฐ์ดํ„ฐ๋“ค์ด ์‚ฌ์˜์ด ๋˜๋Š” ๋Œ€์ƒ์˜ ๋ฒกํ„ฐ๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ๋ถ„์‚ฐ(Variance)์„ ์ž˜ ๋ณด์กดํ•œ๋‹ค๋Š” ๋ฌด์Šจ ์˜๋ฏธ์ผ๊นŒ?

PCA Variance

๋ถ„์‚ฐ๋Ÿ‰(variability)์€ ๊ฐ๊ฐ ๊ธฐ์ €๋“ค์— ์‚ฌ์˜๋œ ์ ๋“ค์˜ ํฉ์–ด์ง„ ์ •๋„๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ์ด๋ฅผ ์ž˜ ๋ณด์กดํ•œ๋‹ค๋Š” ๊ฒƒ์€ ์›๋ž˜ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ์—ˆ๋˜ ๋ถ„์‚ฐ๋Ÿ‰์„ ์–ผ๋งŒํผ ์ž˜ ๊ฐ„์งํ•˜๋Š”๊ฐ€๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ๋”ฐ๋ผ์„œ, ์œ„ ๊ทธ๋ฆผ์€ ๊ฐ๊ฐ X์ถ•๊ณผ Y์ถ•์„ ๊ธฐ์ €๋กœ ์‚ฌ์˜ํ•œ ๊ฒƒ์ด๋ฉฐ, ์ด๋“ค์€ ๊ฐ๊ฐ ์‚ฌ์˜ ์ „์˜ ๋ฐ์ดํ„ฐ์˜ ๋ถ„์‚ฐ์„ ๋ณด์กดํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

PCA๋Š”, ์‚ฌ์˜ ํ›„ ๊ฐ€๋Šฅํ•œ ํ•œ ๋งŽ์€ ๋ถ„์‚ฐ์„ ๋ณด์กดํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ์ €๋“ค์˜ ์ง‘ํ•ฉ์„ ๊ตฌํ•˜๋Š” ๊ฒƒ (find a set of basis that can preserve the variance as much as possible after the projection on the basis) ์„ ๋ชฉ์ ์œผ๋กœ ํ•œ๋‹ค.

PC1_2

(์ฐธ๊ณ ) Covariance Matrix

  • ์ •์˜ :

    ๋‘ ๋ณ€์ˆ˜ ์ด์ƒ์˜ ๋‹ค๋ณ€๋Ÿ‰ ๊ฐ’๋“ค ๊ฐ„์˜ ๊ณต๋ถ„์‚ฐ๋“ค์„ ํ–‰๋ ฌ๋กœ ํ‘œํ˜„ํ•œ ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค.

๐Ÿ’ก ๊ณต๋ถ„์‚ฐ ์ˆ˜์‹

Covariance formulas

๐Ÿ’ก 2๋ณ€๋Ÿ‰, 3๋ณ€๋Ÿ‰ ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ

Covariance Matrixs

  • ํ‘œ๊ธฐ :

    • X:X :X: column-wise ๋ฐ์ดํ„ฐ์…‹ (d x n ; d๋Š” ๋ณ€์ˆ˜์˜ ์ˆ˜, n์€ record์˜ ์ˆ˜)
    • Covโก(X):1n(Xโˆ’Xห‰)(Xโˆ’Xห‰)T,(ย dร—n)โˆ—(nร—d)=(dร—d)\operatorname{Cov}(X):\frac{1}{n}(X-\bar{X})(X-\bar{X})^{T}, (\mathrm{~d} \times n) *(n \times \mathrm{d})=(\mathrm{d} \times \mathrm{d})Cov(X):n1โ€‹(Xโˆ’Xห‰)(Xโˆ’Xห‰)T,(ย dร—n)โˆ—(nร—d)=(dร—d)
  • ํŠน์ง• :

    1. ๋Œ€์นญํ–‰๋ ฌ

    Covโก(Xij)=Covโก(Xji)\operatorname{Cov}(X_{ij}) = \operatorname{Cov}(X_{ji})Cov(Xijโ€‹)=Cov(Xjiโ€‹)

    1. ๋ฐ์ดํ„ฐ์…‹์˜ ์ด ๋ถ„์‚ฐ๋Ÿ‰ = ๋Œ€๊ฐํ–‰๋ ฌ์˜ ํ•ฉ

    trโก(Covโก(X))=Covโก(X)11+Covโก(X)22+โ‹ฏ+Covโก(X)dd\operatorname{tr}(\operatorname{Cov}(X))=\operatorname{Cov}(X)_{11}+\operatorname{Cov}(X)_{22}+\cdots+\operatorname{Cov}(X)_{d d}tr(Cov(X))=Cov(X)11โ€‹+Cov(X)22โ€‹+โ‹ฏ+Cov(X)ddโ€‹

(์ฐธ๊ณ ) Projection

  • ์ •์˜ :

    ํ•œ ์ ์—์„œ ํ•œ ์ง์„  ๋˜๋Š” ํ•œ ํ‰๋ฉด์— ์ˆ˜์„ ์˜ ๋ฐœ์„ ๋‚ด๋ฆด ๋•Œ ์ด๋ฅผ Projection(์‚ฌ์˜)ํ•œ๋‹ค๊ณ  ํ•œ๋‹ค.

Projection

(์ฐธ๊ณ ) Eigen-Value & Eigen-Vector

  • ํ–‰๋ ฌ A๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ๋‹ค์Œ์„ ๋งŒ์กฑํ•˜๋Š” ์Šค์นผ๋ผ ฮป ๊ฐ’๊ณผ ๋ฒกํ„ฐ x๋ฅผ ๊ฐ๊ฐ ๊ณ ์œ ๊ฐ’(Eigen-Value)๊ณผ ๊ณ ์œ ๋ฒกํ„ฐ(Eigen-Vector)๋ผ๊ณ  ์ •์˜ํ•œ๋‹ค.

Ax=ฮปxโ†’(Aโˆ’ฮปI)x=0\mathbf{A x}=\lambda \mathbf{x} \quad \rightarrow \quad(\mathrm{A}-\lambda \mathrm{I}) \mathbf{x}=0Ax=ฮปxโ†’(Aโˆ’ฮปI)x=0

  • ๋ฒกํ„ฐ์— ํ–‰๋ ฌ์„ ๊ณฑํ•œ๋‹ค๋Š” ๊ฒƒ์€ ์„ ํ˜•๋ณ€ํ™˜(linear transformation)์„ ์ˆ˜ํ–‰ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋ฐ‘์˜ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด ํฌ๊ธฐ๋งŒ ๋ฐ”๋€Œ๊ณ  ๋ฐฉํ–ฅ์ด ๋ฐ”๋€Œ์ง€ ์•Š๋Š” ๋ฒกํ„ฐ(ํŒŒ๋ž‘, ํ•‘ํฌ)๊ฐ€ ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ ์ด๋ฅผ โ€œ๊ณ ์œ ๋ฒกํ„ฐโ€๋ผ๊ณ  ํ•˜๊ณ , ์ด๋•Œ ํฌ๊ธฐ๊ฐ€ ฮป๋ฐฐ ๋ฐ”๋€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ ์ด๋ฅผ โ€œ๊ณ ์œ ๊ฐ’โ€์ด๋ผ๊ณ  ํ•œ๋‹ค.

์„ ํ˜•๋ณ€ํ™˜(linear transformation)

  • ํ–‰๋ ฌ A (dxd)๊ฐ€ ์—ญํ–‰๋ ฌ์ด ์กด์žฌํ•œ๋‹ค๋ฉด, ๊ณ ์œ ๊ฐ’๊ณผ ๊ณ ์œ ๋ฒกํ„ฐ๋Š” ๋‹ค์Œ ์„ฑ์งˆ์„ ๊ฐ–๋Š”๋‹ค.

    A (dxd)

    โ‘  d๊ฐœ์˜ ๊ณ ์œ ๊ฐ’๊ณผ ๊ณ ์œ ๋ฒกํ„ฐ ์Œ์„ ๊ฐ–๋Š”๋‹ค.

    โ‘ก ๊ณ ์œ ๋ฒกํ„ฐ๋“ค์€ ์„œ๋กœ ์ง๊ตํ•œ๋‹ค. (eiiโ‹…ei=0,ย ifย eiโ‰ ej)\left(e_{i}{ }_{i} \cdot e_{i}=0, \text { if } e_{i} \neq e_{j}\right)(eiโ€‹iโ€‹โ‹…eiโ€‹=0,ย ifย eiโ€‹๎€ โ€‹=ejโ€‹)

    โ‘ข trโก(A)=ฮป1+ฮป2+ฮป3+โ‹ฏ+ฮปd=โˆ‘i=1dฮปi\operatorname{tr}(A)=\lambda_{1}+\lambda_{2}+\lambda_{3}+\cdots+\lambda_{d}=\sum_{i=1}^{d} \lambda_{i}tr(A)=ฮป1โ€‹+ฮป2โ€‹+ฮป3โ€‹+โ‹ฏ+ฮปdโ€‹=โˆ‘i=1dโ€‹ฮปiโ€‹

PCA Procedure

[1] Data Centering

  • ํ‰๊ท ์„ 0์œผ๋กœ ๋งž์ถ˜๋‹ค. Xโ€ฒ=Xโˆ’Xห‰X^{\prime}=X-\bar{X}Xโ€ฒ=Xโˆ’Xห‰ ์ด๋ผ๊ณ  ํ•˜๋ฉด, Xโ€ฒห‰\bar{X^{\prime}}Xโ€ฒห‰๋Š” 0์ด ๋œ๋‹ค.
  • ์•ž์„œ ์†Œ๊ฐœํ•œ Covโก(X)=1n(Xโˆ’Xห‰)(Xโˆ’Xห‰)T\operatorname{Cov}(X)=\frac{1}{n}(X-\bar{X})(X-\bar{X})^{T}Cov(X)=n1โ€‹(Xโˆ’Xห‰)(Xโˆ’Xห‰)T์˜ (Xโˆ’Xห‰)(X-\bar{X})(Xโˆ’Xห‰) ๊ณผ์ •
  • ๋’ค์—์„œ๋Š” ์‹ ์ž‘์„ฑ์˜ ํŽธ์˜๋ฅผ ์œ„ํ•ด Xโ€ฒ=XX^{\prime}=XXโ€ฒ=X๋กœ ํ‘œ๊ธฐํ•จ

Data Centering

[2] Formulation

  • ์•ž์—์„œ ๋งŒ๋“ค์–ด์ง„ X(Xโ€™)์— ๋Œ€ํ•˜์—ฌ ๋ฒกํ„ฐ(ํ–‰๋ ฌ) X๋ฅผ ๊ธฐ์ € w์— ์ •์‚ฌ์˜ ์‹œํ‚ค๋ฉด, Cov(X) ์‹์ด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฐ”๋€Œ๊ฒŒ ๋œ๋‹ค.
  • ์ด๋•Œ S๋Š” ํ‘œ๋ณธ ๋ถ„์‚ฐ์˜ sample covariance matrix๊ฐ€ ๋œ๋‹ค. ( โˆต ์ด๋ฏธ centering ์ž‘์—…์„ ์ˆ˜ํ–‰ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— )

V=1n(wTX)(wTX)T=1nwTXXTw=wTSwV=\frac{1}{n}\left(\mathbf{w}^{\mathrm{T}} \mathbf{X}\right)\left(\mathbf{w}^{\mathrm{T}} \mathbf{X}\right)^{\mathrm{T}}=\frac{1}{n} \mathbf{w}^{\mathrm{T}} \mathbf{X} \mathbf{X}^{\mathrm{T}} \mathbf{w}=\mathbf{w}^{\mathrm{T}} \mathbf{S} \mathbf{w}V=n1โ€‹(wTX)(wTX)T=n1โ€‹wTXXTw=wTSw

[3] Optimization

  • PCA์˜ ๋ชฉ์ ์ธ ๋ถ„์‚ฐ์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๋ชฉ์ ์— ๋”ฐ๋ผ์„œ ์ตœ์ ํ™” ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ด์งˆ ์ˆ˜ ์žˆ๋‹ค.

maxโกwTSwย s.t.ย wTw=1\begin{gathered} \max \mathbf{w}^{\mathrm{T}} \mathbf{S} \mathbf{w} \ \text { s.t. } \mathbf{w}^{\mathrm{T}} \mathbf{w}=1 \end{gathered}maxwTSwย s.t.ย wTw=1โ€‹

[4] Solve

  • ์œ„์—์„œ ๊ตฌํ•œ ์ตœ์ ํ™”์‹์€ ๋ฐ”๋กœ ํ’€ ์ˆ˜๊ฐ€ ์—†์œผ๋ฏ€๋กœ, Lagrange Multiplier(๋ผ๊ทธ๋ž‘์ฃผ ์Šน์ˆ˜๋ฒ•)์„ ์ด์šฉํ•˜์—ฌ ํ’€ ์ˆ˜ ์žˆ๋‹ค.

๐Ÿ’ก ์—ฌ๊ธฐ์„œ ์ž ๊น! Lagrange Multiplier๋ž€?

ใ…ค๋ผ๊ทธ๋ž‘์ฃผ ์Šน์ˆ˜๋ฒ• (Lagrange multiplier method)์€ ๊ธฐ๋ณธ ๊ฐ€์ •์€ โ€œ์ œ์•ฝ ์กฐ๊ฑด g(x,y) = c๋ฅผ ๋งŒ์กฑํ•˜๋Š” f(x,y)์˜ ์ตœ์†Ÿ๊ฐ’ ๋˜๋Š” ์ตœ๋Œ“๊ฐ’์€ f(x,y)์™€ g(x,y)๊ฐ€ ์ ‘ํ•˜๋Š” ์ง€์ ์— ์กด์žฌํ•  ์ˆ˜๋„ ์žˆ๋‹ค.โ€๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋•Œ, f(x,y)์™€ g(x,y)๊ฐ€ ์ ‘ํ•˜๋Š” ์ง€์ ์„ ์ฐพ๊ธฐ ์œ„ํ•ด gradient vector(โ–ฝ)์„ ์ด์šฉํ•œ๋‹ค. ์–ด๋– ํ•œ ์ง€์ ์—์„œ ์ ‘์„  ๋ฒกํ„ฐ์™€ gradient vector์˜ ๋‚ด์ ์€ 0์ด ๋˜๊ฒŒ ๋œ๋‹ค. ๋”ฐ๋ผ์„œ, ๋‘ ํ•จ์ˆ˜ f์™€ g๊ฐ€ ์ ‘ํ•œ๋‹ค๋Š” ๊ฒƒ์€ ๋‘ ํ•จ์ˆ˜์˜ gradient vector๋Š” ์„œ๋กœ ์ƒ์ˆ˜๋ฐฐ์˜ ๊ด€๊ณ„์— ์žˆ๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋‹ค์Œ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด ์ด๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค: โ–ฝf=ฮปโ–ฝg

Lagrange Multiplier

ใ…ค๋ผ๊ทธ๋ž‘์ฃผ ์Šน์ˆ˜๋ฒ•์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•˜๊ณ , L(x,y,ฮป)=ฮป(g(x,y)-c) ๋กœ ์ •์˜ํ•˜๊ณ , ํ•จ์ˆ˜ L์˜ gradient vector๊ฐ€ ์˜๋ฒกํ„ฐ๊ฐ€ ๋˜๋Š” ์ ์„ ์ฐพ์œผ๋ฉด ๋‘ ํ•จ์ˆ˜ f์™€ g๊ฐ€ ์ ‘ํ•˜๊ฒŒ ๋˜๋Š” ์ ์„ ์ฐพ์„ ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค.

ใ…ค์œ„์—์„œ ์ •์˜ํ•œ ์‹์„ ํ’€์–ด๋ณด๋ฉด, ํ•จ์ˆ˜ L์˜ gradient vector๊ฐ€ ์˜๋ฒกํ„ฐ๊ฐ€ ๋˜๋Š” ์ ์„ ์ฐพ์œผ๋ฉด ๋‘ ํ•จ์ˆ˜ f์™€ g๊ฐ€ ์ ‘ํ•˜๋Š” ์ ์„ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค. ํ•จ์ˆ˜ L์„ x์™€ y์— ๋Œ€ํ•ด ํŽธ๋ฏธ๋ถ„ํ•˜๋ฉด ์ด 2๊ฐœ์˜ ์‹์„ ์–ป์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์—ฌ๊ธฐ์— ์ œ์•ฝ ์กฐ๊ฑด์ธ g(x,y)=c๋ฅผ ์ด์šฉํ•˜๋ฉด ๋ฏธ์ง€์ˆ˜๊ฐ€ 3๊ฐœ(x,y,ฮป)์ธ ๋ฌธ์ œ์˜ ํ•ด (solution)๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ ๊ตฌํ•œ x์™€ y๋Š” ์ œ์•ฝ ์กฐ๊ฑด g๋ฅผ ๋งŒ์กฑํ•˜๋Š” ํ•จ์ˆ˜ f์˜ ์ตœ์ ์ ์ด ๋  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๊ฒŒ ๋œ๋‹ค.

  • ์œ„์—์„œ ๋„์ถœํ•œ maximize ๋ชฉ์ ํ•จ์ˆ˜์™€ ์ œ์•ฝ์‹์„ ์ด์šฉํ•˜์—ฌ Lagrange Multiplier(๋ผ๊ทธ๋ž‘์ฃผ ์Šน์ˆ˜๋ฒ•)์„ ์ ์šฉํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹์ด ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค.L=wTSwโˆ’ฮป(wTwโˆ’1)L=\mathbf{w}^{T} \mathbf{S} \mathbf{w}-\lambda\left(\mathbf{w}^{T} \mathbf{w}-1\right)L=wTSwโˆ’ฮป(wTwโˆ’1)

โˆ‚Lโˆ‚w=0โ‡’Swโˆ’ฮปw=0โ‡’(Sโˆ’ฮปI)w=0\frac{\partial L}{\partial \mathbf{w}}=0 \Rightarrow \mathbf{S} \mathbf{w}-\lambda \mathbf{w}=0 \Rightarrow(\mathbf{S}-\lambda \mathbf{I}) \mathbf{w}=0โˆ‚wโˆ‚Lโ€‹=0โ‡’Swโˆ’ฮปw=0โ‡’(Sโˆ’ฮปI)w=0

  • ๊ฒฐ๊ตญ ์ฃผ์„ฑ๋ถ„ ๋ถ„์„์˜ ํ•ด๋Š” w๋Š” S์˜ eigenvector, ฮป๋Š” S์˜ eigenvalue ๊ฐ€ ๋œ๋‹ค. ๋‹ค์Œ์€ ์‹ค์ œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ํ•ด๋‹น ๊ฐ’์„ ๋„์ถœํ•ด๋ณธ ์˜ˆ์‹œ์ด๋‹ค.

์˜ˆ์‹œ

[5] Find the best set of basis

  • ๋ถ„์‚ฐ ๊ณ„์‚ฐ์„ ํ†ตํ•œ ์ตœ์ ์˜ ๊ธฐ์ €(basis)๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.
  • ์œ„์—์„œ ๊ตฌํ•œ eigenvalue๋ฅผ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ๋‚˜์—ดํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

eigenvalue

  • ์ด ์ค‘ ๊ฐ€์žฅ ํฐ eigenvalue๋ฅผ ฮป1, ๊ฐ–๋Š” eigen vector๋ฅผ w1์ด๋ผ๊ณ  ํ•˜๋ฉด, ํ•ด๋‹น w1์— projection๋œ ๋ถ„์‚ฐ๋Ÿ‰(variation)์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

v=(w1TX)(w1TX)T=w1TXXTw1=w1TSw1\mathbf{v}=\left(\mathbf{w}_{1}^{T} \mathbf{X}\right)\left(\mathbf{w}_{1}^{T} \mathbf{X}\right)^{T}=\mathbf{w}_{1}^{T} \mathbf{X} \mathbf{X}^{T} \mathbf{w}_{1}=\mathbf{w}_{1}^{T} \mathbf{S} \mathbf{w}_{1}v=(w1Tโ€‹X)(w1Tโ€‹X)T=w1Tโ€‹XXTw1โ€‹=w1Tโ€‹Sw1โ€‹ Since,ใ…คSw1=ฮป1w1,w1TSw1=w1Tฮป1w1=ฮป1w1Tw1=ฮป1Since,ใ…ค\mathbf{S w}_{1}=\lambda_{1} \mathbf{w}_{1}, \quad \mathbf{w}_{1}^{T} \mathbf{S} \mathbf{w}_{1}=\mathbf{w}_{1}^{T} \lambda_{1} \mathbf{w}_{1}=\lambda_{1} \mathbf{w}_{1}^{T} \mathbf{w}_{1}=\lambda_{1}Since,ใ…คSw1โ€‹=ฮป1โ€‹w1โ€‹,w1Tโ€‹Sw1โ€‹=w1Tโ€‹ฮป1โ€‹w1โ€‹=ฮป1โ€‹w1Tโ€‹w1โ€‹=ฮป1โ€‹

  • ๋”ฐ๋ผ์„œ, ์ด๋ ‡๊ฒŒ w1์— ์‚ฌ์˜์‹œํ‚จ variation(v)์˜ ๊ฐ’์ด ฮป1์ธ ๊ฒƒ์„ ํ™•์ธํ–ˆ๊ณ , ์–ผ๋งŒํผ ํ•ด๋‹น PC(principal component)๊ฐ€ original variance๋ฅผ ๋ณด์กดํ•˜๋Š” ๊ฐ€๋Š” ๋‹ค์Œ์‹์„ ํ†ตํ•ด ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ํ•ด๋‹น ์ฃผ์„ฑ๋ถ„์€ 2์ฐจ์› ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์˜ํ•˜๋ฉด ์›-๋ฐ์ดํ„ฐ์˜ ๋ถ„์‚ฐ์˜ 96%๋ฅผ ๋ณด์กดํ•œ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.ฮป1ฮป1+ฮป2=1.28401.2840+0.0491=0.9632โ€ฆ\frac{ฮป_{1}}{ฮป_{1}+ฮป_{2}}=\frac{1.2840}{1.2840+0.0491}=0.9632โ€ฆฮป1โ€‹+ฮป2โ€‹ฮป1โ€‹โ€‹=1.2840+0.04911.2840โ€‹=0.9632โ€ฆ

[6] Extraction and Reconstruction

  • ์ฃผ์„ฑ๋ถ„(๊ธฐ์ €)์„ ์ด์šฉํ•ด ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜ ์ƒ์„ฑ ๋ฐ ์ด๋ฅผ ๋‹ค์‹œ ๋ณต์›ํ•  ์ˆ˜ ์žˆ๋‹ค.

    (2์ฐจ์› -> 1์ฐจ์› -> 2์ฐจ์›)

  • ์ด๋•Œ, ์™„๋ฒฝํ•˜๊ฒŒ๋Š” ๋ณต์›๋˜์ง€๋Š” ์•Š๋Š”๋‹ค. ์žฌ๊ตฌ์ถ• ์˜ค์ฐจ๊ฐ€ ์กด์žฌํ•˜๋Š”๋ฐ, ๋’ค์— anomaly detection์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

2์ฐจ์›_1์ฐจ์›_2์ฐจ์›

  • ๊ทธ๋ ‡๋‹ค๋ฉด ๋ช‡ ๊ฐœ์˜ Principal Component(PC)๊ฐ€ ์ตœ์ ์ผ๊นŒ? (์ •ํ™•ํ•œ ๋‹ต์€ ์—†๋‹คใ… )

    โ‘  โ€œ์ •์„ฑ์ ์œผ๋กœ ๋„๋ฉ”์ธ ์ „๋ฌธ๊ฐ€๊ฐ€ ๋ช‡ ๊ฐœ๊ฐ€ ์ข‹์•„!โ€ํ•˜๊ณ  PC์˜ ๊ฐœ์ˆ˜๋ฅผ ์ •ํ•ด์ค„ ์ˆ˜ ์žˆ๋‹ค.

    โ‘ก โ€œ์ •๋Ÿ‰์ ์œผ๋กœ ์›๋ž˜ ๋ณ€์ˆ˜์˜ ๋ช‡ ํผ์„ผํŠธ๊ฐ€ ๋ณด์กด๋˜์—ˆ๋Š”๊ฐ€โ€๋ฅผ ๊ธฐ์ค€์œผ๋กœ PC์˜ ๊ฐœ์ˆ˜๋ฅผ ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

Scree Plot/Elbow

  • Scree Plot์„ ๋„์‹œํ™”ํ•˜์—ฌ, โ‘  ๊ฐ๊ฐ์˜ PC์— ๋Œ€ํ•œ variance๋ฅผ ๊ตฌํ•˜์—ฌ elbow point(๊ธ‰๊ฒฉํ•˜๊ฒŒ ๊ฐ์†Œํ•˜๋Š” ๊ตฌ๊ฐ„)๋ฅผ ์ฐพ๊ฑฐ๋‚˜, โ‘ก ๋ˆ„์  ์„ค๋ช…๋ ฅ N%๊ธฐ์ค€ ์„ค์ •์„ ํ†ตํ•ด ์ตœ์ ์˜ PC์˜ ๊ฐœ์ˆ˜๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

PCA Limitation

  1. ํ†ต๊ณ„์ ์œผ๋กœ ์ถœ๋ฐœํ•œ ๋ชจ๋ธ์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ฅผ ๊ฐ€์ •ํ•œ๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๊ฐ€ ์•„๋‹Œ ๋ฐ์ดํ„ฐ์—๋Š” ์ž˜ ์ž‘๋™๋˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ๋‹ค.
  2. ๋ถ„์‚ฐ์„ ์ตœ๋Œ€ํ•œ ๋ณด์กดํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๋ฏ€๋กœ, ๋ถ„๋ฅ˜ ๋ชจ๋ธ์—๋Š” ์ ๋‹นํ•˜์ง€ ์•Š๋‹ค. (์•„๋ž˜ ๊ทธ๋ฆผ ์ฐธ๊ณ )

PCA/LDA

๊ธด ๊ธ€ ์ฝ์–ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค ^~^



-->