[์ดํƒ] ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜ ์ด์ƒํƒ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜

Posted by Euisuk's Dev Log on May 14, 2024

[์ดํƒ] ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜ ์ด์ƒํƒ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜

์›๋ณธ ๊ฒŒ์‹œ๊ธ€: https://velog.io/@euisuk-chung/๊ตฐ์ง‘ํ™”-๊ธฐ๋ฐ˜-์ด์ƒํƒ์ง€-์•Œ๊ณ ๋ฆฌ์ฆ˜-sbjqco5v

๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜

๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ๋‹ค๋ฅธ ํฌ์ธํŠธ์™€์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ๋ฉ€๋‹ค๋ฉด ์ด์ƒ์น˜๋กœ ๊ฐ„์ฃผ๋ฉ๋‹ˆ๋‹ค.

K-Means Clustering ๊ธฐ๋ฐ˜ ์ด์ƒํƒ์ง€

  • ๋งํฌ: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
  • ์ •์˜: ๋ฐ์ดํ„ฐ๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ๋‚˜๋ˆ„๊ณ , ๊ฐ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์—์„œ์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•ฉ๋‹ˆ๋‹ค. ํด๋Ÿฌ์Šคํ„ฐ์— ์†ํ•˜์ง€ ์•Š๊ฑฐ๋‚˜ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์—์„œ ๋ฉ€๋ฆฌ ๋–จ์–ด์ง„ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋ฅผ ์ด์ƒ์น˜๋กœ ๊ฐ„์ฃผํ•ฉ๋‹ˆ๋‹ค.
  • ์ ํ•ฉํ•œ ๋ฐ์ดํ„ฐ: ๋ฐ์ดํ„ฐ์˜ ๊ตฐ์ง‘ ๊ตฌ์กฐ๊ฐ€ ๋ช…ํ™•ํ•œ ๊ฒฝ์šฐ์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ๊ฐ€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ทธ๋ฃน์œผ๋กœ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋‚˜๋ˆ ์ง€๋Š” ๊ฒฝ์šฐ ํšจ๊ณผ์ ์ž…๋‹ˆ๋‹ค.
  • sklearn ํ•จ์ˆ˜: sklearn.cluster.KMeans

    • ํ•จ์ˆ˜ ์„ค๋ช…:

      • K-Means ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฐ์ดํ„ฐ๋ฅผ K๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ๋‚˜๋ˆ„๊ณ , ๊ฐ ํด๋Ÿฌ์Šคํ„ฐ์˜ ์ค‘์‹ฌ(์„ผํŠธ๋กœ์ด๋“œ)์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ๊ณ„์‚ฐํ•˜์—ฌ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋Š” ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์— ํ• ๋‹น๋ฉ๋‹ˆ๋‹ค.
      • ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํŠนํžˆ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์—์„œ๋„ ๋น ๋ฅด๊ณ  ํšจ์œจ์ ์œผ๋กœ ์ž‘๋™ํ•˜๋ฉฐ, ํด๋Ÿฌ์Šคํ„ฐ ๋‚ด ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ๋ฐ€์ง‘๋„๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • ๋งค๊ฐœ๋ณ€์ˆ˜:

      • n_clusters (int, default=8): ํ˜•์„ฑํ•  ํด๋Ÿฌ์Šคํ„ฐ์˜ ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • init (str, callable or array-like, default=โ€™k-means++โ€™): ์ดˆ๊ธฐ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์„ ์„ค์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. โ€˜k-means++โ€™๋Š” ์ˆ˜๋ ด ์†๋„๋ฅผ ๋†’์ด๊ธฐ ์œ„ํ•ด ์ดˆ๊ธฐ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์„ ์„ ํƒํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
      • n_init (int or str, default=โ€™autoโ€™): ๋‹ค๋ฅธ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ ์‹œ๋“œ๋กœ K-Means ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ–‰ํ•  ํšŸ์ˆ˜์ž…๋‹ˆ๋‹ค. ์ตœ์ข… ๊ฒฐ๊ณผ๋Š” ์ด ์ค‘ ๊ฐ€์žฅ ์ข‹์€ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.
      • max_iter (int, default=300): ๋‹จ์ผ ์‹คํ–‰์—์„œ K-Means ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ตœ๋Œ€ ๋ฐ˜๋ณต ํšŸ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • tol (float, default=1e-4): ์ˆ˜๋ ด์„ ์„ ์–ธํ•˜๊ธฐ ์œ„ํ•œ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ ๋ณ€ํ™”์˜ ํ—ˆ์šฉ ์˜ค์ฐจ์ž…๋‹ˆ๋‹ค.
      • verbose (int, default=0): ์ƒ์„ธ ์ถœ๋ ฅ ๋ชจ๋“œ์ž…๋‹ˆ๋‹ค.
      • random_state (int, RandomState instance or None, default=None): ๋ฌด์ž‘์œ„ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ ์ดˆ๊ธฐํ™”๋ฅผ ์œ„ํ•œ ๋‚œ์ˆ˜ ์ƒ์„ฑ๊ธฐ ์‹œ๋“œ์ž…๋‹ˆ๋‹ค.
      • copy_x (bool, default=True): True๋กœ ์„ค์ •ํ•˜๋ฉด ์›๋ณธ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ •ํ•˜์ง€ ์•Š๊ณ , False๋กœ ์„ค์ •ํ•˜๋ฉด ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค.
      • algorithm (str, default=โ€™lloydโ€™): K-Means ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ โ€˜lloydโ€™์™€ โ€˜elkanโ€™ ์ค‘ ์„ ํƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. โ€˜elkanโ€™์€ ์‚ผ๊ฐ๋ถ€๋“ฑ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ํšจ์œจ์„ฑ์„ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • ์†์„ฑ:

      • cluster_centers_: ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์˜ ์ขŒํ‘œ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
      • labels_: ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์— ํ• ๋‹น๋œ ํด๋Ÿฌ์Šคํ„ฐ ๋ผ๋ฒจ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
      • inertia_: ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์—์„œ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ์˜ ํ•ฉ์ž…๋‹ˆ๋‹ค.
      • n_iter_: ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์‹คํ–‰๋œ ๋ฐ˜๋ณต ํšŸ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • n_features_in_: fit ๋™์•ˆ ๋ณธ ํŠน์„ฑ์˜ ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • feature_names_in_: fit ๋™์•ˆ ๋ณธ ํŠน์„ฑ ์ด๋ฆ„ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
    • ๋ฉ”์„œ๋“œ:

      • fit(X[, y, sample_weight]): K-Means ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
      • fit_predict(X[, y, sample_weight]): ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์„ ๊ณ„์‚ฐํ•˜๊ณ  ๊ฐ ์ƒ˜ํ”Œ์— ๋Œ€ํ•œ ํด๋Ÿฌ์Šคํ„ฐ ์ธ๋ฑ์Šค๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
      • fit_transform(X[, y, sample_weight]): ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ์ˆ˜ํ–‰ํ•˜๊ณ  X๋ฅผ ํด๋Ÿฌ์Šคํ„ฐ-๊ฑฐ๋ฆฌ ๊ณต๊ฐ„์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
      • get_feature_names_out([input_features]): ๋ณ€ํ™˜์„ ์œ„ํ•œ ์ถœ๋ ฅ ํŠน์„ฑ ์ด๋ฆ„์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
      • get_metadata_routing(): ์ด ๊ฐ์ฒด์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋ผ์šฐํŒ…์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
      • get_params([deep]): ์ด ์ถ”์ •๊ธฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
      • predict(X[, sample_weight]): ๊ฐ ์ƒ˜ํ”Œ์ด ์†ํ•œ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
      • score(X[, y, sample_weight]): K-Means ๋ชฉ์  ํ•จ์ˆ˜์˜ ๊ฐ’์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
      • set_fit_request(*[, sample_weight]): fit ๋ฉ”์„œ๋“œ์— ์ „๋‹ฌ๋œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์š”์ฒญ์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
      • set_output(*[, transform]): ์ถœ๋ ฅ ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
      • set_params(**params): ์ด ์ถ”์ •๊ธฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
      • set_predict_request(*[, sample_weight]): predict ๋ฉ”์„œ๋“œ์— ์ „๋‹ฌ๋œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์š”์ฒญ์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
      • set_score_request(*[, sample_weight]): score ๋ฉ”์„œ๋“œ์— ์ „๋‹ฌ๋œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์š”์ฒญ์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
      • transform(X): X๋ฅผ ํด๋Ÿฌ์Šคํ„ฐ-๊ฑฐ๋ฆฌ ๊ณต๊ฐ„์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      
      from sklearn.cluster import KMeans
      import numpy as np
          
      # ์˜ˆ์ œ ๋ฐ์ดํ„ฐ
      X = np.array([[1, 2], [1, 4], [1, 0],
                    [10, 2], [10, 4], [10, 0]])
          
      # K-Means ๋ชจ๋ธ ์ƒ์„ฑ ๋ฐ ์ ํ•ฉ
      kmeans = KMeans(n_clusters=2, random_state=0, n_init="auto").fit(X)
          
      # ํด๋Ÿฌ์Šคํ„ฐ ๋ผ๋ฒจ ์ถœ๋ ฅ
      print(kmeans.labels_)
          
      # ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ํด๋Ÿฌ์Šคํ„ฐ ์˜ˆ์ธก
      print(kmeans.predict([[0, 0], [12, 3]]))
          
      # ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ ์ถœ๋ ฅ
      print(kmeans.cluster_centers_)
      

โœ๏ธ ์ด ์˜ˆ์ œ๋Š” K-Means๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ„๋‹จํ•œ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. K-Means๋Š” ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋ฅผ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ๊ทธ๋ฃนํ™”ํ•˜๊ณ , ๊ฐ ํด๋Ÿฌ์Šคํ„ฐ์˜ ์ค‘์‹ฌ์„ ๊ณ„์‚ฐํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

KNN (K-Nearest Neighbors)

  • ๋งํฌ : https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html
  • ์ •์˜: K-Nearest Neighbors(KNN) ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์— ๋Œ€ํ•ด ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด K๊ฐœ์˜ ์ด์›ƒ์„ ์ฐพ๊ณ , ์ด ์ด์›ƒ๋“ค ๊ฐ„์˜ ๊ฑฐ๋ฆฌ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ด๋‹น ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ํŠน์„ฑ์„ ํŒ๋‹จํ•˜๋Š” ๋น„์ง€๋„ ํ•™์Šต ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ฃผ๋กœ ์ด์ƒ์น˜ ํƒ์ง€, ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜, ๊ตฐ์ง‘ํ™” ๋“ฑ์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • ์ ํ•ฉํ•œ ๋ฐ์ดํ„ฐ: ๊ฑฐ๋ฆฌ ์ธก์ •์ด ์˜๋ฏธ ์žˆ๋Š” ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•˜๋Š” ๋ฐ์ดํ„ฐ์…‹์—์„œ ํšจ๊ณผ์ ์ด๋ฉฐ, ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ตฌ์กฐ์ ์ธ ๋ฐ์ดํ„ฐ์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ์ด์ƒ์น˜ ํƒ์ง€์—์„œ๋Š” ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ๋ฐ€์ง‘๋„๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • sklearn ํ•จ์ˆ˜: sklearn.neighbors.NearestNeighbors

    • ํ•จ์ˆ˜ ์„ค๋ช…:

      • NearestNeighbors๋Š” ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ด์›ƒ์„ ์ฐพ๊ธฐ ์œ„ํ•œ ๋น„์ง€๋„ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‹ค์–‘ํ•œ ๊ฑฐ๋ฆฌ ์ธก์ • ๋ฐฉ๋ฒ•๊ณผ ๊ฒ€์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ง€์›ํ•˜๋ฉฐ, ์ฃผ๋กœ K-์ด์›ƒ ํƒ์ƒ‰ ๋ฐ ๋ฐ˜๊ฒฝ ์ด์›ƒ ํƒ์ƒ‰์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
    • ๋งค๊ฐœ๋ณ€์ˆ˜:

      • n_neighbors (int, default=5): K-์ด์›ƒ ํƒ์ƒ‰ ์‹œ ์‚ฌ์šฉํ•  ์ด์›ƒ์˜ ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • radius (float, default=1.0): ๋ฐ˜๊ฒฝ ์ด์›ƒ ํƒ์ƒ‰ ์‹œ ์‚ฌ์šฉํ•  ๋ฐ˜๊ฒฝ ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค.
      • algorithm (str, default=โ€™autoโ€™): ์ด์›ƒ์„ ์ฐพ๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•  ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. โ€˜autoโ€™, โ€˜ball_treeโ€™, โ€˜kd_treeโ€™, โ€˜bruteโ€™ ์ค‘ ์„ ํƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
      • leaf_size (int, default=30): BallTree ๋˜๋Š” KDTree์—์„œ ์‚ฌ์šฉํ•  ๋ฆฌํ”„์˜ ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ํŠธ๋ฆฌ ๊ตฌ์ถ• ๋ฐ ์ฟผ๋ฆฌ ์†๋„์— ์˜ํ–ฅ์„ ์ค๋‹ˆ๋‹ค.
      • metric (str or callable, default=โ€™minkowskiโ€™): ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ์— ์‚ฌ์šฉํ•  ๋ฉ”ํŠธ๋ฆญ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. โ€˜minkowskiโ€™๋Š” p=2์ผ ๋•Œ ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ์™€ ๋™์ผํ•ฉ๋‹ˆ๋‹ค.
      • p (float, default=2): Minkowski ๋ฉ”ํŠธ๋ฆญ์—์„œ ์‚ฌ์šฉํ•  ํŒŒ๋ผ๋ฏธํ„ฐ์ž…๋‹ˆ๋‹ค. p=1์€ ๋งจํ•ดํŠผ ๊ฑฐ๋ฆฌ, p=2๋Š” ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค.
      • metric_params (dict, default=None): ๋ฉ”ํŠธ๋ฆญ ํ•จ์ˆ˜์˜ ์ถ”๊ฐ€ ๋งค๊ฐœ๋ณ€์ˆ˜์ž…๋‹ˆ๋‹ค.
      • n_jobs (int, default=None): ๋ณ‘๋ ฌ๋กœ ์‹คํ–‰ํ•  ์ž‘์—…์˜ ์ˆ˜๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. -1๋กœ ์„ค์ •ํ•˜๋ฉด ๋ชจ๋“  ํ”„๋กœ์„ธ์„œ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    • ์†์„ฑ:

      • effective_metric_: ์ด์›ƒ ๊ฒ€์ƒ‰์— ์‚ฌ์šฉ๋œ ๊ฑฐ๋ฆฌ ๋ฉ”ํŠธ๋ฆญ์ž…๋‹ˆ๋‹ค.
      • effective_metric_params_: ๊ฑฐ๋ฆฌ ๋ฉ”ํŠธ๋ฆญ์— ์‚ฌ์šฉ๋œ ๋งค๊ฐœ๋ณ€์ˆ˜์ž…๋‹ˆ๋‹ค.
      • n_features_in_: fit ๋™์•ˆ ๋ณธ ํŠน์„ฑ์˜ ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • feature_names_in_: fit ๋™์•ˆ ๋ณธ ํŠน์„ฑ ์ด๋ฆ„ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
      • n_samples_fit_: ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ์ƒ˜ํ”Œ ์ˆ˜์ž…๋‹ˆ๋‹ค.
    • ๋ฉ”์„œ๋“œ:

      • fit(X[, y]): ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ตœ๊ทผ์ ‘ ์ด์›ƒ ์ถ”์ •๊ธฐ๋ฅผ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค.
      • get_metadata_routing(): ์ด ๊ฐ์ฒด์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋ผ์šฐํŒ…์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
      • get_params([deep]): ์ด ์ถ”์ •๊ธฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
      • kneighbors([X, n_neighbors, return_distance]): ํŠน์ • ํฌ์ธํŠธ์˜ K-์ด์›ƒ์„ ์ฐพ์Šต๋‹ˆ๋‹ค.
      • kneighbors_graph([X, n_neighbors, mode]): ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์— ๋Œ€ํ•œ K-์ด์›ƒ ๊ทธ๋ž˜ํ”„(๊ฐ€์ค‘์น˜ ํฌํ•จ)๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
      • radius_neighbors([X, radius, ...]): ํŠน์ • ๋ฐ˜๊ฒฝ ๋‚ด์˜ ์ด์›ƒ์„ ์ฐพ์Šต๋‹ˆ๋‹ค.
      • radius_neighbors_graph([X, radius, mode, ...]): ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์— ๋Œ€ํ•œ ๋ฐ˜๊ฒฝ ์ด์›ƒ ๊ทธ๋ž˜ํ”„(๊ฐ€์ค‘์น˜ ํฌํ•จ)๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
      • set_params(**params): ์ด ์ถ”์ •๊ธฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      
      from sklearn.neighbors import NearestNeighbors
      import numpy as np
          
      # ์˜ˆ์ œ ๋ฐ์ดํ„ฐ
      samples = [[0, 0, 2], [1, 0, 0], [0, 0, 1]]
          
      # NearestNeighbors ๋ชจ๋ธ ์ƒ์„ฑ ๋ฐ ์ ํ•ฉ
      neigh = NearestNeighbors(n_neighbors=2, radius=0.4)
      neigh.fit(samples)
          
      # k-์ด์›ƒ ํƒ์ƒ‰
      print(neigh.kneighbors([[0, 0, 1.3]], 2, return_distance=False))
          
      # ๋ฐ˜๊ฒฝ ์ด์›ƒ ํƒ์ƒ‰
      nbrs = neigh.radius_neighbors([[0, 0, 1.3]], 0.4, return_distance=False)
      print(np.asarray(nbrs[0][0]))
      

โœ๏ธ ์ด ์˜ˆ์ œ๋Š” NearestNeighbors๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ„๋‹จํ•œ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ K-์ด์›ƒ ํƒ์ƒ‰ ๋ฐ ๋ฐ˜๊ฒฝ ์ด์›ƒ ํƒ์ƒ‰์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. KNN์€ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ€๊นŒ์šด ์ด์›ƒ์„ ์ฐพ๊ณ , ์ด๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ์„ ํŒŒ์•…ํ•˜๊ฑฐ๋‚˜ ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ๋Š” ์œ ์šฉํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.

K-Means์™€ KNN์˜ ์ฐจ์ด์ 

K-Means์™€ K-Nearest Neighbors(KNN)๋Š” ๋ชจ๋‘ ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜์˜ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์ง€๋งŒ, ๊ทธ ๋ชฉ์ ๊ณผ ์‚ฌ์šฉ ๋ฐฉ์‹์ด ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ์ด ๋‘ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ด์ƒํƒ์ง€ ๊ด€์ ์—์„œ ๋น„๊ตํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • K-Means: ๋ฐ์ดํ„ฐ๋ฅผ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ ๊ฐ ํด๋Ÿฌ์Šคํ„ฐ ๋‚ด์˜ ๋ฐ€์ง‘๋„๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ฃผ๋กœ ๊ตฐ์ง‘ํ™”๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ์˜ ๊ตฌ์กฐ๋ฅผ ํŒŒ์•…ํ•˜๊ณ  ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•ฉ๋‹ˆ๋‹ค.

    • ์‚ฌ์šฉ ๋ฐฉ์‹: ๋ฐ์ดํ„ฐ ์ „์ฒด๋ฅผ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ๊ทธ๋ฃนํ™”ํ•˜๊ณ  ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์—์„œ์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด์ƒ์น˜๋ฅผ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค.

    • ๊ตฐ์ง‘ํ™” ์—ฌ๋ถ€: ๋ช…ํ™•ํ•œ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ˜•์„ฑํ•˜์—ฌ ํด๋Ÿฌ์Šคํ„ฐ ๋‚ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

  • KNN: ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ์ด์›ƒ๊ณผ์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐ€๋„๋ฅผ ํ‰๊ฐ€ํ•˜์—ฌ ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ฃผ๋กœ ๊ฐœ๋ณ„ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ๋ฐ€๋„๋ฅผ ํ‰๊ฐ€ํ•˜์—ฌ ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•ฉ๋‹ˆ๋‹ค.

    • ์‚ฌ์šฉ ๋ฐฉ์‹: ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ์ด์›ƒ๊ณผ์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๋ฐ€๋„๋ฅผ ํ‰๊ฐ€ํ•˜๊ณ , ๋ฐ€๋„๊ฐ€ ๋‚ฎ์€ ํฌ์ธํŠธ๋ฅผ ์ด์ƒ์น˜๋กœ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค.

    • ๊ตฐ์ง‘ํ™” ์—ฌ๋ถ€: ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ˜•์„ฑํ•˜์ง€ ์•Š๊ณ , ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ์ด์›ƒ๊ณผ์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ง์ ‘ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

์š”์•ฝํ•˜์ž๋ฉด, K-Means์™€ KNN์€ ๋ชจ๋‘ ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜ ์ด์ƒํƒ์ง€๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, K-Means๋Š” ๊ตฐ์ง‘ํ™”๋ฅผ ํ†ตํ•ด ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์—์„œ์˜ ๊ฑฐ๋ฆฌ๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ˜๋ฉด, KNN์€ ๊ฐœ๋ณ„ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ์ด์›ƒ๊ณผ์˜ ๊ฑฐ๋ฆฌ๋ฅผ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.



-->