[๋จธ์‹ ๋Ÿฌ๋‹] ์ด์ƒ ํƒ์ง€ ๊ฐœ์š” ๋ฐ ๋ฐ€๋„ ๊ธฐ๋ฐ˜ ์ด์ƒ์น˜ํƒ์ง€

Posted by Euisuk's Dev Log on November 21, 2021

[๋จธ์‹ ๋Ÿฌ๋‹] ์ด์ƒ ํƒ์ง€ ๊ฐœ์š” ๋ฐ ๋ฐ€๋„ ๊ธฐ๋ฐ˜ ์ด์ƒ์น˜ํƒ์ง€

์›๋ณธ ๊ฒŒ์‹œ๊ธ€: https://velog.io/@euisuk-chung/๋จธ์‹ ๋Ÿฌ๋‹์ฐจ์›์ถ•์†Œ-์ด์ƒ์น˜-ํƒ์ง€-๊ธฐ๋ฒ•-๋ฐ€๋„๊ธฐ๋ฐ˜-์ด์ƒ์น˜-ํƒ์ง€

๋ณธ ํฌ์ŠคํŠธ๋Š” ๊ณ ๋ ค๋Œ€ํ•™๊ต ๊ฐ•ํ•„์„ฑ ๊ต์ˆ˜๋‹˜์˜ ๊ฐ•์˜๋ฅผ ์ˆ˜๊ฐ• ํ›„ ์ •๋ฆฌ๋ฅผ ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ž‘์„ฑ ๋ฐ ์„ค๋ช…์˜ ํŽธ์˜๋ฅผ ์œ„ํ•ด ์•„๋ž˜ ํฌ์ŠคํŠธ๋Š” ๋ฐ˜๋ง๋กœ ์ž‘์„ฑํ•œ ์  ์–‘ํ•ด๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

Abnormal Data๋ž€

  • Anomaly Data๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด Hawkins์™€ Harmeling์— ์˜ํ•ด ์ •์˜๋œ๋‹ค.

Observations that deviate so much from other observations as to arouse suspicions that they were generated by a different mechanism. โ€“ Hawkins, 1908

Instances that their true probability density is very low. โ€“ Harmeling et al, 2006

  • ๋‹ค์‹œ ํ•œ๋ฒˆ ์ •๋ฆฌํ•ด๋ณด์ž๋ฉด, ์ด์ƒ์น˜ ํƒ์ง€๋Š” ๊ธฐ์กด ๋ฐ์ดํ„ฐ๋“ค๊ณผ ์ƒ์„ฑ๋˜๋Š” ๋งค์ปค๋‹ˆ์ฆ˜์ด ๋‹ค๋ฅด๊ฑฐ๋‚˜ ๋ฐœ์ƒ ๋นˆ๋„๊ฐ€ ๋‚ฎ์€ ๋ฐ์ดํ„ฐ๋ฅผ ์˜๋ฏธํ•œ๋‹ค.
  • ๊ทธ๋ ‡๋‹ค๋ฉด, ์ด์ƒ์น˜ ํƒ์ง€๊ฐ€ ์ผ๋ฐ˜์ ์ธ ์šฐ๋ฆฌ๊ฐ€ ๋”ฅ๋Ÿฌ๋‹/๋จธ์‹ ๋Ÿฌ๋‹์—์„œ ์ˆ˜ํ–‰ํ•˜๋Š” Binary Classification๊ณผ๋Š” ์–ด๋–ค ์ฐจ์ด๊ฐ€ ์žˆ์„๊นŒ?

1. ํ•™์Šต ๋ฐฉ๋ฒ•๋ก  ๊ด€์ 

  • Binary Classification: ์ •์ƒ๊ณผ ๋น„์ •์ƒ์„ ๊ตฌ๋ถ„ํ•˜๋Š” ๋ถ„๋ฅ˜๊ฒฝ๊ณ„๋ฉด(์„ )์„ ํ•™์Šตํ•˜์—ฌ ์ด๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ •์ƒ๊ณผ ๋น„์ •์ƒ์„ ๋‚˜๋ˆˆ๋‹ค. (A, B โ€“ ์ •์ƒ)
  • Anomaly Detection: ์ด์ƒ์น˜๋ž€ ํ•œ๊ฐ€์ง€ ์ข…๋ฅ˜๋งŒ ์žˆ๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด x์™€ โ–ฒ๋ฅผ ์ด์ƒ์น˜๋ผ๊ณ  ํ•˜์ž. ์ด ๋‘˜ ์ด์ƒ์น˜ ๋ชจ๋‘ ์ˆ˜๊ฐ€ ์ ์–ด ์ด์ƒ์น˜๋ฅผ ๋Œ€ํ‘œํ•˜์ง€ ๋ชปํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ™์€ ์ด์ƒ์น˜๋ผ๊ณ  binaryํ•˜๊ฒŒ ์ •์˜๋ฅผ ๋‚ด๋ ค์ค„ ์ˆ˜ ์—†๋‹ค. ๋•Œ๋ฌธ์— ์ด์ƒ์น˜ ํƒ์ง€์—์„œ๋Š” ์ •์ƒ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์ •์ƒ ์˜์—ญ์„ ์ถ”์ •ํ•˜๊ณ  ๊ทธ ์™ธ์˜ ์˜์—ญ์— ์†ํ•˜๋Š” ๊ฐ’๋“ค์„ ์ •์ƒ์ด ์•„๋‹ˆ๋ผ๊ณ  ํŒ๋‹จํ•œ๋‹ค. (A, B โ€“ ๋น„์ •์ƒ)

AD define

2. ํ•™์Šต ๋ฐ์ดํ„ฐ ๊ด€์ 

  • Anomaly Detection์€ ๊ธฐ๋ณธ์ ์œผ๋กœ โ€œ์ •์ƒโ€ ๋ฐ์ดํ„ฐ๊ฐ€ โ€œ๋น„์ •์ƒโ€ ๋ฐ์ดํ„ฐ๋ณด๋‹ค ํ›จ์”ฌ ๋” ๋งŽ๋‹ค๋Š” ๊ฒƒ์„ ๊ฐ€์ •ํ•˜๊ณ  ์žˆ๋‹ค.
  • Binary Classification: ์ •์ƒ๊ณผ ๋น„์ •์ƒ(์ด์ƒ์น˜) ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋‘ ์ด์šฉํ•˜์—ฌ ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.
  • Anomaly Detection: ๋น„์ •์ƒ(์ด์ƒ์น˜) ๋ฐ์ดํ„ฐ๋ฅผ ์ œ์™ธํ•œ ์ •์ƒ ๋ฐ์ดํ„ฐ๋งŒ์„ ๊ฐ€์ง€๊ณ  ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

    AD Data

3. ํ‰๊ฐ€ ๋ฐฉ๋ฒ• ๊ด€์ 

AD Metric

๋ฐ€๋„ ๊ธฐ๋ฐ˜ ์ด์ƒ์น˜ ํƒ์ง€

  • ์ •์ƒ ๋ฐ์ดํ„ฐ์˜ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ •์ƒ ์ƒํƒœ์˜ ๋ถ„ํฌ๋ฅผ ์ถ”์ •ํ•œ ๋’ค, ์ƒˆ๋กœ์šด ๊ฐ์ฒด์— ๋Œ€ํ•˜์—ฌ ํ™•๋ฅ ์ด ๋†’์œผ๋ฉด ์ •์ƒ, ํ™•๋ฅ ์ด ๋‚ฎ์œผ๋ฉด ๋น„์ •์ƒ์„ ๋ฐ˜ํ™˜ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์ด๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ์„ค๋ช…ํ•˜๋Š” ๋ชจ์ˆ˜ ๋ชจํ˜•(Parametric Model์„ ๊ฐ€์ •ํ•˜๋ฉฐ, ์ •๊ทœ ๋ถ„ํฌ๋กœ ์ถ”์ •์„ ํ•  ๋•Œ ๋ช‡ ๊ฐœ์˜ ๊ฐ€์šฐ์‹œ์•ˆ ๋ชจ๋ธ์ด ์‚ฌ์šฉ๋˜์—ˆ๋Š”๊ฐ€์— ๋”ฐ๋ผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ถ„๋ฅ˜๋  ์ˆ˜ ์žˆ๋‹ค.

Density-based

Gaussian Density Estimation

  • ๊ฐ€์ •

    • ๊ด€์ธก์น˜๋“ค์€ ํ•˜๋‚˜์˜ Gaussian์œผ๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑ๋˜์—ˆ๋‹ค.

    Gaussian Density Estimation

  • ์žฅ์ 

    1. ๋ฐ์ดํ„ฐ์˜ ๋ฒ”์œ„์— ๋ฏผ๊ฐํ•˜์ง€ ์•Š๋‹ค. (โˆต ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์€ ์ธก์ • ๋‹จ์œ„๊ฐ€ ์˜ํ–ฅ์„ ๋ผ์น˜์ง€ ์•Š์Œ)

    2. ๋ถ„ํฌ๋ฅผ ์ถ”์ •ํ•œ ํ•™์Šต๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์ฒ˜์Œ๋ถ€ํ„ฐ rejection์— ๋Œ€ํ•œ 1์ข… ์˜ค๋ฅ˜๋ฅผ ์ •์˜ํ•  ์ˆ˜ ์žˆ๋‹ค. (ex. ์‹ ๋ขฐ์ˆ˜์ค€ 95%)

  • Formulation (Parameter estimation: ฮผฮผฮผ, ฯƒ2ฯƒ^2ฯƒ2)

G-Formulation

Mixture of Gaussian Density Estimation

  • ๊ฐ€์ •

    • ๊ด€์ธก์น˜๋“ค์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ Gaussian๋“ค์˜ ์„ ํ˜•๊ฒฐํ•ฉ์œผ๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑ๋˜์—ˆ๋‹ค.
  • ๊ฐ€์šฐ์‹œ์•ˆ ๊ฒฐํ•ฉ ๋ชจ๋ธ๊ณผ ๊ฐ๊ฐ์˜ ๊ฐ€์šฐ์‹œ์•ˆ ๋ชจ๋ธ(์ˆ˜์‹)

    MoG and each Gaussian Models

  • Formulation (Parameter estimation: ฮผmฮผ_mฮผmโ€‹, wmw_mwmโ€‹, ฮฃmฮฃ_mฮฃmโ€‹)

    • EM ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

    MoG-Formulation

    MoG-Formulation2

Kernel Density Estimation

  • ์ด์ „ Gaussian Density Estimation๊ณผ Mixture of Gaussian Density Estimation์—์„œ๋Š” ํŠน์ • ๋ถ„ํฌ๋ฅผ ๊ฐ€์ •ํ•˜๊ณ , ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” parametric approach์˜€๋‹ค.
  • Kernel Density Estimation์€ non-parametric approach๋กœ, ๋ถ„ํฌ๋ฅผ ์˜ˆ์ธกํ•˜์ง€ ์•Š๊ณ  ๋ฐ์ดํ„ฐ ์ž์ฒด๋ฅผ ์ด์šฉํ•ด์„œ ๋ฐ€๋„๋ฅผ ์ถ”์ •ํ•˜๊ณ ์ž ํ•œ๋‹ค.
  • ๋ถ„ํฌ p(x)์—์„œ ์ถ”์ถœํ•œ ๋ฒกํ„ฐ x๊ฐ€ ํ‘œ๋ณธ ๊ณต๊ฐ„์˜ ์ฃผ์–ด์ง„ ์˜์—ญ R์— ํฌํ•จ๋  ํ™•๋ฅ ์„ P๋ผ๊ณ  ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

    KDE

  • ์•„๋ž˜ ์‹์—์„œ V๋ฅผ ๊ณ ์ •ํ•˜๊ณ  k๋ฅผ ์ฐพ์•„์ฃผ๋Š” ๊ฒƒ์ด Kernel-density Estimation์˜ ์ฃผ์š” ์•„์ด๋””์–ด์ด๋ฉฐ, Parzen Window Density Estimation์€ Kernel-density Estimation์˜ ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ๋ฒ• ์ค‘์— ํ•˜๋‚˜์ด๋‹ค.

    KDE2

Parzen Window Density Estimation

  • Parzen Window Density Estimation์—์„œ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ d์ฐจ์› ๊ณต๊ฐ„(V = hd) ์•ˆ์— ์žˆ๋Š” ์ƒ˜ํ”Œ์˜ ๊ฐœ์ˆ˜๋ฅผ ์„ธ๋Š” ๋ฐฉ์‹์œผ๋กœ ๋ฐ€๋„๋ฅผ ์ถ”์ •ํ•˜๊ฒŒ ๋œ๋‹ค.
  • ๋ฐ‘์˜ K(u)์‹์€ ํ•ด๋‹น ๊ณต๊ฐ„ ์•ˆ์— ์ƒ˜ํ”Œ(X)๊ฐ€ ๋“ค์–ด์˜ค๋ฉด 1์„ ๋ฐ˜ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜์ด๊ณ , k๋Š” ํ•ด๋‹น ๊ณต๊ฐ„์•ˆ์— ์žˆ๋Š” ์ƒ˜ํ”Œ์˜ ์ˆ˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ์ด์™€ ๊ฐ™์€ K๋Š” ์ปค๋„ ํ•จ์ˆ˜์˜ ์ผ์ข…์ด๋ฉฐ ํŒŒ์   ์œˆ๋„์šฐ(Parzen Window)๋ผ๊ณ  ํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ ํ™•๋ฅ  ๋ฐ€๋„ ํ•จ์ˆ˜๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

    Parzen Window

  • ํ•˜์ง€๋งŒ, ์œ„์˜ K(u)ํ•จ์ˆ˜๋Š” ์˜์—ญ ์•ˆ์ด๋ฉด 1, ๋ฐ–์ด๋ฉด 0์„ ๋ถ€์—ฌํ•˜๊ฒŒ ๋จ์œผ๋กœ ํ๋ธŒ์˜ ๊ธฐ์žฅ์ž๋ฆฌ ์˜์—ญ์—์„œ ๋ถˆ์—ฐ์†์„ฑ์„ ๊ฐ–๊ฒŒ ๋˜๊ณ , uniform distribution์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ฑฐ๋ฆฌ๊ฐ€ ๋‹ฌ๋ผ๋„ ๋ชจ๋‘ ๊ฐ™์€ ๊ฐ€์ค‘์น˜๊ฐ€ ๊ณฑํ•ด์ง„๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค.
  • ์ด๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ๊ฐœ๋ณ„์ ์ธ ๊ฐ์ฒด๋ฅผ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์˜ ์ค‘์‹ฌ์œผ๋กœ ๋ณด๊ณ  ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•ด์ฃผ๋Š” smoothing์„ ์ทจํ•ด์ค„ ์ˆ˜ ์žˆ๋‹ค.

    Parzen Window2

  • ์ด๋•Œ smoothing parameter h๋ฅผ ๋„ˆ๋ฌด ์ž‘๊ฒŒ ์žก์•„์ฃผ๋ฉด ๋พฐ์กฑ๋พฐ์กฑํ•˜๊ฒŒ under-smoothingํ•œ ํ˜•ํƒœ๋ฅผ ๋ณด์ด๊ณ , h๋ฅผ ๋„ˆ๋ฌด ํฌ๊ฒŒ ์žก์•„์ฃผ๋ฉด ๋‘๋ฃจ๋ญ‰์ˆ ํ•˜๊ฒŒ over-smoothingํ•œ ํ˜•ํƒœ๋ฅผ ๋ณด์ธ๋‹ค.

    Parzen Window3

Local Outlier Factor (LOF)

  • ์•„๋ž˜ Cluster1๊ณผ Cluster2์—์„œ ๊ฐ๊ฐ์˜ ์  O1, O2๋Š” ๊ฐ™์€ ๊ฑฐ๋ฆฌ๋งŒํผ ๋–จ์–ด์ ธ ์žˆ๋‹ค. ๊ณผ์—ฐ ์–ด๋А ์ ์„ ์ด์ƒ์น˜๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์„๊นŒ? ๋‹จ์ˆœํžˆ ๊ฑฐ๋ฆฌ๋งŒ์„ ๋น„๊ตํ•ด๋ณด๋ฉด ๊ฐ๊ฐ์˜ ๊ตฐ์ง‘์—์„œ ๊ฐ์ฒด๋“ค์€ ๊ฐ™์€ ๊ฑฐ๋ฆฌ๋งŒํผ ๋–จ์–ด์ ธ ์žˆ์œผ๋ฏ€๋กœ ๊ฐ™๋‹ค๊ณ  ํŒ๋‹จ๋˜๊ฒ ์ง€๋งŒ, ์šฐ๋ฆฌ๋Š” ๊ทธ๋Ÿฌํ•œ ๊ฐ’์„ ์›ํ•˜์ง€ ์•Š๋Š”๋‹ค.

    LOF

  • Local Outlier Factor(LOF)์˜ ๋ชฉ์  : ์šฐ๋ฆฌ๋Š” O2์˜ abnormal score๊ฐ€ O1์˜ abnormal score๋ณด๋‹ค ํฌ๊ฒŒ ์ธก์ •์ด ๋˜๊ธธ ์›ํ•œ๋‹ค.
  • LOF๋ฅผ ์•Œ๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋‹ค์Œ 5๊ฐ€์ง€ ๊ฐœ๋…์˜ ๊ฑฐ๋ฆฌ์— ๋Œ€ํ•ด์„œ ์•Œ์•„์•ผ ํ•œ๋‹ค.

    โ‘  k-distance(p)

  • ๊ฐ์ฒด p๋กœ๋ถ€ํ„ฐ k๋ฒˆ์งธ ๊ทผ์ ‘ ์ด์›ƒ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ

    LOF Step1

โ‘ก Nk(p)

  • k-distance(p) ์•ˆ์— ๋“ค์–ด์˜ค๋Š” object์˜ ์ง‘ํ•ฉ

    LOF Step2

โ‘ข reachability-distancek(p,o)

  • max{k-distance(o), d(p,o)}, ์ด์›ƒ o๋ฅผ ๊ธฐ์ค€์œผ๋กœ k-distance(o)์™€ d(p,o)์‚ฌ์ด์˜ max ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค. ์ด ์ž‘์—…์„ ํ†ตํ•ด k-distance ์•ˆ ์ชฝ์— ์žˆ๋Š” ์ด์›ƒ๋“ค์˜ ๊ฑฐ๋ฆฌ๋ฅผ k-distance ๊ฑฐ๋ฆฌ๋กœ ์น˜ํ™˜ํ•ด์ฃผ๊ฒŒ ๋œ๋‹ค.

    LOF Step3

โ‘ฃ lrdk(p)

  • local reachability density of an object p, ๊ฐ์ฒด p๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ–ˆ์„ ๋•Œ local density distance.

    LOF Step4

โ‘ค LOFk(p)

  • local outlier factor of an object p

    LOF Step5

  • ๋‹จ์ 

    1. ๊ณ„์‚ฐ ๋ณต์žก๋„๊ฐ€ ๋†’๋‹ค.
    2. Score ๊ฐ’์ด normalize๋˜์ง€์•Š์•„ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ์…‹๊ณผ์˜ ๋น„๊ต๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค.

๋‹ค์Œ ํฌ์ŠคํŠธ๋Š” ๊ฑฐ๋ฆฌ/๊ตฐ์ง‘/์„œํฌํŠธ๋ฒกํ„ฐ ๊ธฐ๋ฐ˜ ์ด์ƒ์น˜ํƒ์ง€ ๊ธฐ๋ฒ•๋“ค๋กœ ์ฐพ์•„๋ต™๊ฒ ์Šต๋‹ˆ๋‹ค.

๊ธด ๊ธ€ ์ฝ์–ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค ^~^



-->