[์ดํƒ] ๋ฐ€๋„ ๊ธฐ๋ฐ˜ ์ด์ƒํƒ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜

Posted by Euisuk's Dev Log on May 14, 2024

[์ดํƒ] ๋ฐ€๋„ ๊ธฐ๋ฐ˜ ์ด์ƒํƒ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜

์›๋ณธ ๊ฒŒ์‹œ๊ธ€: https://velog.io/@euisuk-chung/๋ฐ€๋„-๊ธฐ๋ฐ˜-์ด์ƒํƒ์ง€-์•Œ๊ณ ๋ฆฌ์ฆ˜

๋ฐ€๋„ ๊ธฐ๋ฐ˜ ์ด์ƒํƒ์ง€

๋ฐ€๋„ ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ ๊ฐ„์˜ ์ง€์—ญ์  ๋ฐ€๋„๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๊ฐ€ ๋ถˆ๊ท ์ผํ•œ ๊ฒฝ์šฐ์—๋„ ์ž˜ ์ž‘๋™ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

DBSCAN

  • ๋งํฌ: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html
  • ์ •์˜: ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ๋ฐ€์ง‘ํ•œ ์˜์—ญ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ˜•์„ฑํ•˜๊ณ , ์ฃผ์–ด์ง„ ๊ฑฐ๋ฆฌ ๋‚ด์—์„œ ์ถฉ๋ถ„ํ•œ ์ˆ˜์˜ ์ด์›ƒ์„ ๊ฐ–์ง€ ๋ชปํ•œ ์ ๋“ค์„ ์ด์ƒ์น˜๋กœ ๊ฐ„์ฃผํ•ฉ๋‹ˆ๋‹ค.
  • ์ ํ•ฉํ•œ ๋ฐ์ดํ„ฐ: ๋ฐ€๋„ ๋ณ€ํ™”๊ฐ€ ์‹ฌํ•˜๊ฑฐ๋‚˜ ๋ถˆ๊ท ์ผํ•œ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋ฅผ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์…‹์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.
  • sklearn ํ•จ์ˆ˜: sklearn.cluster.DBSCAN

    • ํ•จ์ˆ˜ ์„ค๋ช…:

      • DBSCAN์€ โ€œDensity-Based Spatial Clustering of Applications with Noiseโ€์˜ ์•ฝ์ž๋กœ, ๋†’์€ ๋ฐ€๋„๋ฅผ ๊ฐ€์ง„ ํ•ต์‹ฌ ์ƒ˜ํ”Œ์„ ์ฐพ๊ณ  ์ด๋ฅผ ํ™•์žฅํ•˜์—ฌ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ์œ ์‚ฌํ•œ ๋ฐ€๋„๋ฅผ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์‹๋ณ„ํ•˜๋Š” ๋ฐ ํŠนํžˆ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.
      • ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํŠนํžˆ ๋ฐ์ดํ„ฐ์— ์žก์Œ์ด ๋งŽ์€ ๊ฒฝ์šฐ์—๋„ ๊ฐ•๋ ฅํ•˜๋ฉฐ, ํด๋Ÿฌ์Šคํ„ฐ์˜ ํ˜•ํƒœ์™€ ํฌ๊ธฐ์— ๊ตฌ์• ๋ฐ›์ง€ ์•Š๊ณ  ํšจ๊ณผ์ ์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.
    • ๋งค๊ฐœ๋ณ€์ˆ˜:

      • eps (float, default=0.5): ์ด ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ํ•œ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋ฅผ ํ•ต์‹ฌ ํฌ์ธํŠธ๋กœ ๊ณ ๋ คํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ์ตœ๋Œ€ ์ด์›ƒ ๊ฑฐ๋ฆฌ๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ์ ์ ˆํ•œ eps ๊ฐ’์„ ์„ ํƒํ•˜๋Š” ๊ฒƒ์ด ๊ฒฐ๊ณผ์˜ ํ’ˆ์งˆ์— ๊ฒฐ์ •์ ์ธ ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค.
      • min_samples (int, default=5): ํ•ต์‹ฌ ํฌ์ธํŠธ๊ฐ€ ๋˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ์ด์›ƒ์˜ ์ตœ์†Œ ์ˆ˜์ž…๋‹ˆ๋‹ค. ์ด ๊ฐ’์ด ๋†’์„์ˆ˜๋ก ๋” ์กฐ๋ฐ€ํ•œ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ˜•์„ฑํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
      • metric (str or callable, default=โ€™euclideanโ€™): ์ธ์Šคํ„ด์Šค ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•  ๋•Œ ์‚ฌ์šฉ๋˜๋Š” ๋ฉ”ํŠธ๋ฆญ์ž…๋‹ˆ๋‹ค.
      • algorithm (str, default=โ€™autoโ€™): ์ด์›ƒ์„ ์ฐพ๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ โ€˜autoโ€™, โ€˜ball_treeโ€™, โ€˜kd_treeโ€™, โ€˜bruteโ€™๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
    • ์†์„ฑ:

      • core_sample_indices_: ํ•ต์‹ฌ ์ƒ˜ํ”Œ์˜ ์ธ๋ฑ์Šค ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
      • components_: ํ•ต์‹ฌ ์ƒ˜ํ”Œ์˜ ์‹ค์ œ ๊ฐ’ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
      • labels_: ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์— ํ• ๋‹น๋œ ํด๋Ÿฌ์Šคํ„ฐ ๋ผ๋ฒจ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค. ์žก์Œ ํฌ์ธํŠธ๋Š” -1 ๋ผ๋ฒจ์„ ๋ฐ›์Šต๋‹ˆ๋‹ค.
    • ๋ฉ”์„œ๋“œ:

      • fit(X[, y, sample_weight]): ํŠน์ง• ๋˜๋Š” ๊ฑฐ๋ฆฌ ํ–‰๋ ฌ๋กœ๋ถ€ํ„ฐ DBSCAN ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
      • fit_predict(X[, y, sample_weight]): ๋ฐ์ดํ„ฐ ๋˜๋Š” ๊ฑฐ๋ฆฌ ํ–‰๋ ฌ๋กœ๋ถ€ํ„ฐ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ๋ ˆ์ด๋ธ”์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
      • get_metadata_routing(): ์ด ๊ฐ์ฒด์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋ผ์šฐํŒ…์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
      • get_params([deep]): ์ด ์ถ”์ •๊ธฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
      • set_fit_request(*[, sample_weight]): fit ๋ฉ”์„œ๋“œ์— ์ „๋‹ฌ๋œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์š”์ฒญ์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
      • set_params(**params): ์ด ์ถ”์ •๊ธฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      
      from sklearn.cluster import DBSCAN
      import numpy as np
          
      # ์˜ˆ์ œ ๋ฐ์ดํ„ฐ
      X = np.array([[1, 2], [2, 2], [2, 3],
                    [8, 7], [8, 8], [25, 80]])
          
      # DBSCAN ๋ชจ๋ธ ์ƒ์„ฑ ๋ฐ ์ ํ•ฉ
      clustering = DBSCAN(eps=3, min_samples=2).fit(X)
          
      # ํด๋Ÿฌ์Šคํ„ฐ ๋ผ๋ฒจ ์ถœ๋ ฅ
      print(clustering.labels_)
      

โœ๏ธ ์ด ์˜ˆ์ œ๋Š” DBSCAN์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ„๋‹จํ•œ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. DBSCAN์€ ํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •์— ๋”ฐ๋ผ ๋งค์šฐ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํŠนํžˆ ๋น„์„ ํ˜•์ ์ด๊ณ  ๋ณต์žกํ•œ ๊ตฌ์กฐ์—์„œ ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•ฉ๋‹ˆ๋‹ค.

LOF (Local Outlier Factor)

  • ๋งํฌ: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.LocalOutlierFactor.html
  • ์ •์˜: LOF(Local Outlier Factor)๋Š” ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ๋ฐ€๋„์™€ ๊ทธ ์ฃผ๋ณ€ ์ด์›ƒ์˜ ๋ฐ€๋„๋ฅผ ๋น„๊ตํ•˜์—ฌ ์ƒ๋Œ€์ ์œผ๋กœ ๋ฐ€๋„๊ฐ€ ๋‚ฎ์€ ํฌ์ธํŠธ๋ฅผ ์ด์ƒ์น˜๋กœ ๊ฐ„์ฃผํ•˜๋Š” ์ด์ƒ์น˜ ํƒ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ์ง€์—ญ ๋ฐ€๋„ ํŽธ์ฐจ๋ฅผ ์ธก์ •ํ•˜์—ฌ ์ด์ƒ์น˜ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  • ์ ํ•ฉํ•œ ๋ฐ์ดํ„ฐ: ๋ฐ€๋„๊ฐ€ ๋ถˆ๊ท ์ผํ•œ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋ฅผ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์…‹์—์„œ, ํŠนํžˆ ๋ฐ์ดํ„ฐ์˜ ์ง€์—ญ์  ๋ฐ€๋„ ์ฐจ์ด๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•˜๋Š”๋ฐ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.
  • sklearn ํ•จ์ˆ˜: sklearn.neighbors.LocalOutlierFactor

    • ํ•จ์ˆ˜ ์„ค๋ช…:

      • LOF๋Š” โ€œLocal Outlier Factorโ€์˜ ์•ฝ์ž๋กœ, ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ์ง€์—ญ ๋ฐ€๋„๋ฅผ ํ‰๊ฐ€ํ•˜์—ฌ ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ฃผ์–ด์ง„ ํฌ์ธํŠธ๊ฐ€ ์ฃผ๋ณ€ ์ด์›ƒ์— ๋น„ํ•ด ์–ผ๋งˆ๋‚˜ ๊ณ ๋ฆฝ๋˜์–ด ์žˆ๋Š”์ง€๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.
      • ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฐ์ดํ„ฐ์˜ ๋ฐ€๋„ ๊ธฐ๋ฐ˜ ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•˜๋Š”๋ฐ ๊ฐ•๋ ฅํ•˜๋ฉฐ, ํŠนํžˆ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๊ฐ€ ๊ท ์ผํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ์— ํšจ๊ณผ์ ์ž…๋‹ˆ๋‹ค.
    • ๋งค๊ฐœ๋ณ€์ˆ˜:

      • n_neighbors (int, default=20): k-์ด์›ƒ ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ ์‚ฌ์šฉํ•  ์ด์›ƒ์˜ ์ˆ˜๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฐ’์€ ์ง€์—ญ ๋ฐ€๋„ ๊ณ„์‚ฐ์— ์ค‘์š”ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค.
      • algorithm (str, default=โ€™autoโ€™): ์ด์›ƒ์„ ์ฐพ๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ โ€˜autoโ€™, โ€˜ball_treeโ€™, โ€˜kd_treeโ€™, โ€˜bruteโ€™๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
      • leaf_size (int, default=30): BallTree๋‚˜ KDTree์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋ฆฌํ”„์˜ ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ํŠธ๋ฆฌ ๊ตฌ์ถ•๊ณผ ์ฟผ๋ฆฌ ์†๋„์— ์˜ํ–ฅ์„ ์ค๋‹ˆ๋‹ค.
      • metric (str or callable, default=โ€™minkowskiโ€™): ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ์— ์‚ฌ์šฉ๋˜๋Š” ๋ฉ”ํŠธ๋ฆญ์œผ๋กœ ๊ธฐ๋ณธ์ ์œผ๋กœ Minkowski ๊ฑฐ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
      • p (float, default=2): Minkowski ๋ฉ”ํŠธ๋ฆญ์—์„œ ์‚ฌ์šฉํ•  ํŒŒ๋ผ๋ฏธํ„ฐ์ž…๋‹ˆ๋‹ค. p=2๋Š” ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ, p=1์€ ๋งจํ•ดํŠผ ๊ฑฐ๋ฆฌ๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
      • metric_params (dict, default=None): ๋ฉ”ํŠธ๋ฆญ ํ•จ์ˆ˜์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ๋งค๊ฐœ๋ณ€์ˆ˜์ž…๋‹ˆ๋‹ค.
      • contamination (โ€˜autoโ€™ or float, default=โ€™autoโ€™): ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ด์ƒ์น˜์˜ ๋น„์œจ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. โ€˜autoโ€™๋Š” ์› ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ๊ฐ’์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
      • novelty (bool, default=False): True๋กœ ์„ค์ •ํ•˜๋ฉด ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•˜๋Š”๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
      • n_jobs (int, default=None): ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ์— ์‚ฌ์šฉํ•  CPU ์ฝ”์–ด ์ˆ˜๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. -1๋กœ ์„ค์ •ํ•˜๋ฉด ๋ชจ๋“  ํ”„๋กœ์„ธ์„œ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    • ์†์„ฑ:

      • negative_outlier_factor_: ํ›ˆ๋ จ ์ƒ˜ํ”Œ์˜ ๋ฐ˜๋Œ€ LOF ๊ฐ’์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค. ๊ฐ’์ด ๋†’์„์ˆ˜๋ก ์ •์ƒ์— ๊ฐ€๊น์Šต๋‹ˆ๋‹ค.
      • n_neighbors_: ์‹ค์ œ๋กœ ์‚ฌ์šฉ๋œ ์ด์›ƒ์˜ ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • offset_: ์ด์ƒ์น˜ ํƒ์ง€๋ฅผ ์œ„ํ•œ ๊ธฐ์ค€ ์˜คํ”„์…‹์ž…๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ -1.5๋กœ ์„ค์ •๋ฉ๋‹ˆ๋‹ค.
      • effective_metric_: ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ์— ์‚ฌ์šฉ๋œ ๋ฉ”ํŠธ๋ฆญ์ž…๋‹ˆ๋‹ค.
      • effective_metric_params_: ๋ฉ”ํŠธ๋ฆญ ํ•จ์ˆ˜์˜ ์ถ”๊ฐ€ ๋งค๊ฐœ๋ณ€์ˆ˜์ž…๋‹ˆ๋‹ค.
      • n_features_in_: ํ›ˆ๋ จ ์ค‘์— ๋ณธ ํŠน์„ฑ์˜ ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • feature_names_in_: ํ›ˆ๋ จ ์ค‘์— ๋ณธ ํŠน์„ฑ ์ด๋ฆ„ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
      • n_samples_fit_: ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์˜ ์ƒ˜ํ”Œ ์ˆ˜์ž…๋‹ˆ๋‹ค.
    • ๋ฉ”์„œ๋“œ:

      • decision_function(X): ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ X์˜ Local Outlier Factor(LOF)์˜ ๋ฐ˜๋Œ€๊ฐ’์„ ๋ณ€ํ™˜ํ•˜์—ฌ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
      • fit(X[, y]): ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์—์„œ LOF(Local Outlier Factor) ํƒ์ง€๊ธฐ๋ฅผ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค.
      • fit_predict(X[, y]): ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹ X์— ๋ชจ๋ธ์„ ์ ํ•ฉ์‹œํ‚ค๊ณ  ๋ ˆ์ด๋ธ”์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
      • get_metadata_routing(): ์ด ๊ฐ์ฒด์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋ผ์šฐํŒ…์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
      • get_params([deep]): ์ด ์ถ”์ •๊ธฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
      • kneighbors([X, n_neighbors, return_distance]): ํŠน์ • ํฌ์ธํŠธ์˜ K-์ด์›ƒ์„ ์ฐพ์Šต๋‹ˆ๋‹ค.
      • kneighbors_graph([X, n_neighbors, mode]): ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์— ๋Œ€ํ•œ K-์ด์›ƒ ๊ทธ๋ž˜ํ”„(๊ฐ€์ค‘์น˜ ํฌํ•จ)๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
      • predict([X]): LOF๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ X์˜ ๋ ˆ์ด๋ธ”(์ •์ƒ: 1, ์ด์ƒ์น˜: -1)์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
      • score_samples(X): ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ X์˜ Local Outlier Factor(LOF)์˜ ๋ฐ˜๋Œ€๊ฐ’์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
      • set_params(**params): ์ด ์ถ”์ •๊ธฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      
      from sklearn.neighbors import LocalOutlierFactor
      import numpy as np
          
      # ์˜ˆ์ œ ๋ฐ์ดํ„ฐ
      X = [[-1.1], [0.2], [101.1], [0.3]]
          
      # LOF ๋ชจ๋ธ ์ƒ์„ฑ ๋ฐ ์ ํ•ฉ
      clf = LocalOutlierFactor(n_neighbors=2)
      y_pred = clf.fit_predict(X)
          
      # ์ด์ƒ์น˜ ์ ์ˆ˜ ์ถœ๋ ฅ
      print(clf.negative_outlier_factor_)
      

โœ๏ธ ์ด ์˜ˆ์ œ๋Š” LocalOutlierFactor๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ„๋‹จํ•œ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ์ด์ƒ์น˜ ํƒ์ง€๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. LOF๋Š” ์ฃผ๋ณ€ ์ด์›ƒ๊ณผ์˜ ๋ฐ€๋„ ๋น„๊ต๋ฅผ ํ†ตํ•ด ์ด์ƒ์น˜๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์‹๋ณ„ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํŠนํžˆ ๋ฐ€๋„ ๊ธฐ๋ฐ˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง์—์„œ ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•ฉ๋‹ˆ๋‹ค.



-->