[์ดํƒ] ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ์ด์ƒํƒ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜

Posted by Euisuk's Dev Log on May 14, 2024

[์ดํƒ] ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ์ด์ƒํƒ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜

์›๋ณธ ๊ฒŒ์‹œ๊ธ€: https://velog.io/@euisuk-chung/ํ†ต๊ณ„-๊ธฐ๋ฐ˜-์ด์ƒํƒ์ง€-์•Œ๊ณ ๋ฆฌ์ฆ˜-8rjzifg0

ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ์ด์ƒํƒ์ง€

ํ†ต๊ณ„์  ์ ‘๊ทผ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์ด ๋ฐ์ดํ„ฐ์˜ โ€˜์ •์ƒ์ ์ธโ€™ ํ–‰๋™์„ ํ•™์Šตํ•˜๊ณ  ํ†ต๊ณ„์ ์œผ๋กœ ์ด๋ก€์ ์ธ ํ–‰๋™์„ ๋ณด์ด๋Š” ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋ฅผ ์ด์ƒ์น˜๋กœ ์‹๋ณ„ํ•ฉ๋‹ˆ๋‹ค.

GMM (Gaussian Mixture Models)

  • ๋งํฌ: https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html
  • ์ •์˜: Gaussian Mixture Models(GMM)์€ ๋ฐ์ดํ„ฐ๋ฅผ ์—ฌ๋Ÿฌ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์˜ ํ˜ผํ•ฉ์œผ๋กœ ๋ชจ๋ธ๋งํ•˜์—ฌ ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ์†ํ•  ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•˜๊ฑฐ๋‚˜ ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ ํ•ฉํ•œ ๋ฐ์ดํ„ฐ: ๋ฐ์ดํ„ฐ๊ฐ€ ํ•˜๋‚˜ ์ด์ƒ์˜ ์ •๊ทœ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋Š” ๊ฒƒ์œผ๋กœ ๊ฐ€์ •ํ•  ๋•Œ ํšจ๊ณผ์ ์ด๋ฉฐ, ๋ฐ์ดํ„ฐ์˜ ์ž ์žฌ์  ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์‹๋ณ„ํ•˜๋Š” ๋ฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • sklearn ํ•จ์ˆ˜: sklearn.mixture.GaussianMixture

    • ํ•จ์ˆ˜ ์„ค๋ช…:

      • GaussianMixture๋Š” ๊ฐ€์šฐ์‹œ์•ˆ ํ˜ผํ•ฉ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ์ถ”์ •ํ•˜๋Š” ํด๋ž˜์Šค์ž…๋‹ˆ๋‹ค. EM(Expectation-Maximization) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ๊ฐ€์šฐ์‹œ์•ˆ ์„ฑ๋ถ„์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค.
      • ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ณ , ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ํŠน์ • ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์— ์†ํ•  ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
    • ๋งค๊ฐœ๋ณ€์ˆ˜:

      • n_components (int, default=1): ํ˜ผํ•ฉ ์„ฑ๋ถ„์˜ ์ˆ˜๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ํด๋Ÿฌ์Šคํ„ฐ์˜ ์ˆ˜์™€ ๋Œ€์‘ํ•ฉ๋‹ˆ๋‹ค.
      • covariance_type (str, default=โ€™fullโ€™): ๊ณต๋ถ„์‚ฐ ๋งคํŠธ๋ฆญ์Šค์˜ ํ˜•ํƒœ๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. โ€˜fullโ€™, โ€˜tiedโ€™, โ€˜diagโ€™, โ€˜sphericalโ€™ ์ค‘ ์„ ํƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
      • tol (float, default=1e-3): EM ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ˆ˜๋ ด ์ž„๊ณ„๊ฐ’์ž…๋‹ˆ๋‹ค. ์ด ๊ฐ’ ์ดํ•˜๋กœ ๋กœ๊ทธ ๊ฐ€๋Šฅ๋„ ๋ณ€ํ™”๊ฐ€ ์ž‘์•„์ง€๋ฉด ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์ˆ˜๋ ดํ•ฉ๋‹ˆ๋‹ค.
      • reg_covar (float, default=1e-6): ๊ณต๋ถ„์‚ฐ ๋งคํŠธ๋ฆญ์Šค์— ์ถ”๊ฐ€๋˜๋Š” ์ •๊ทœํ™” ํ•ญ์ž…๋‹ˆ๋‹ค. ๊ณต๋ถ„์‚ฐ ๋งคํŠธ๋ฆญ์Šค๊ฐ€ ์–‘์˜ ์ •๋ถ€ํ˜ธ์„ฑ์„ ๊ฐ–๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.
      • max_iter (int, default=100): EM ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ตœ๋Œ€ ๋ฐ˜๋ณต ํšŸ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • n_init (int, default=1): ์ดˆ๊ธฐํ™”๋ฅผ ์ˆ˜ํ–‰ํ•  ํšŸ์ˆ˜์ž…๋‹ˆ๋‹ค. ๊ฐ€์žฅ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
      • init_params (str, default=โ€™kmeansโ€™): ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. โ€˜kmeansโ€™, โ€˜k-means++โ€™, โ€˜randomโ€™, โ€˜random_from_dataโ€™ ์ค‘ ์„ ํƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
      • weights_init (array-like, default=None): ์ดˆ๊ธฐ ๊ฐ€์ค‘์น˜ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค. ์ œ๊ณต๋˜์ง€ ์•Š์œผ๋ฉด init_params ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค.
      • means_init (array-like, default=None): ์ดˆ๊ธฐ ํ‰๊ท  ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค. ์ œ๊ณต๋˜์ง€ ์•Š์œผ๋ฉด init_params ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค.
      • precisions_init (array-like, default=None): ์ดˆ๊ธฐ ์ •๋ฐ€๋„(๊ณต๋ถ„์‚ฐ์˜ ์—ญํ–‰๋ ฌ) ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค. ์ œ๊ณต๋˜์ง€ ์•Š์œผ๋ฉด init_params ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค.
      • random_state (int, RandomState instance or None, default=None): ๋‚œ์ˆ˜ ์ƒ์„ฑ ์‹œ๋“œ๋ฅผ ์„ค์ •ํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ์žฌํ˜„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
      • warm_start (bool, default=False): True๋กœ ์„ค์ •ํ•˜๋ฉด ์ด์ „์— ํ•™์Šต๋œ ๊ฒฐ๊ณผ๋ฅผ ์ดˆ๊ธฐํ™”์— ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
      • verbose (int, default=0): ํ•™์Šต ๊ณผ์ •์˜ ์ง„ํ–‰ ์ƒํ™ฉ์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
      • verbose_interval (int, default=10): ๋ช‡ ๋ฒˆ์งธ ๋ฐ˜๋ณต๋งˆ๋‹ค ์ถœ๋ ฅ์„ ํ• ์ง€ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
    • ์†์„ฑ:

      • weights_: ๊ฐ ํ˜ผํ•ฉ ์„ฑ๋ถ„์˜ ๊ฐ€์ค‘์น˜ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
      • means_: ๊ฐ ํ˜ผํ•ฉ ์„ฑ๋ถ„์˜ ํ‰๊ท  ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
      • covariances_: ๊ฐ ํ˜ผํ•ฉ ์„ฑ๋ถ„์˜ ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
      • precisions_: ๊ฐ ํ˜ผํ•ฉ ์„ฑ๋ถ„์˜ ์ •๋ฐ€๋„ ํ–‰๋ ฌ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
      • precisions_cholesky_: ๊ฐ ํ˜ผํ•ฉ ์„ฑ๋ถ„์˜ ์ •๋ฐ€๋„ ํ–‰๋ ฌ์˜ Cholesky ๋ถ„ํ•ด์ž…๋‹ˆ๋‹ค.
      • converged_: ๋ชจ๋ธ์ด ์ˆ˜๋ ดํ–ˆ๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
      • n_iter_: ์ˆ˜๋ ดํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ ๋ฐ˜๋ณต ํšŸ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • lower_bound_: EM ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ ๊ณ„์‚ฐ๋œ ๋กœ๊ทธ ๊ฐ€๋Šฅ๋„์˜ ํ•˜ํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค.
      • n_features_in_: ํ•™์Šต ์‹œ ์‚ฌ์šฉ๋œ ํŠน์„ฑ์˜ ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • feature_names_in_: ํ•™์Šต ์‹œ ์‚ฌ์šฉ๋œ ํŠน์„ฑ ์ด๋ฆ„ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
    • ๋ฉ”์„œ๋“œ:

      • aic(X): ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ X์— ๋Œ€ํ•œ ํ˜„์žฌ ๋ชจ๋ธ์˜ Akaike ์ •๋ณด ๊ธฐ์ค€(AIC)์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
      • bic(X): ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ X์— ๋Œ€ํ•œ ํ˜„์žฌ ๋ชจ๋ธ์˜ ๋ฒ ์ด์ง€์•ˆ ์ •๋ณด ๊ธฐ์ค€(BIC)์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
      • fit(X[, y]): EM ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ถ”์ •ํ•ฉ๋‹ˆ๋‹ค.
      • fit_predict(X[, y]): ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ X๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ถ”์ •ํ•˜๊ณ  ๋ ˆ์ด๋ธ”์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
      • get_metadata_routing(): ์ด ๊ฐ์ฒด์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋ผ์šฐํŒ…์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
      • get_params([deep]): ์ด ์ถ”์ •๊ธฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
      • predict(X): ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ X์˜ ๋ ˆ์ด๋ธ”์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
      • predict_proba(X): ๊ฐ ์ƒ˜ํ”Œ์— ๋Œ€ํ•œ ๊ตฌ์„ฑ ์š”์†Œ์˜ ๋ฐ€๋„๋ฅผ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
      • sample([n_samples]): ํ•™์Šต๋œ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์—์„œ ๋žœ๋ค ์ƒ˜ํ”Œ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
      • score(X[, y]): ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ X์— ๋Œ€ํ•œ ์ƒ˜ํ”Œ ๋‹น ํ‰๊ท  ๋กœ๊ทธ ๊ฐ€๋Šฅ๋„๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
      • score_samples(X): ๊ฐ ์ƒ˜ํ”Œ์— ๋Œ€ํ•œ ๋กœ๊ทธ ๊ฐ€๋Šฅ๋„๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
      • set_params(**params): ์ด ์ถ”์ •๊ธฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      
      from sklearn.mixture import GaussianMixture
      import numpy as np
          
      # ์˜ˆ์ œ ๋ฐ์ดํ„ฐ
      X = np.array([[1, 2], [1, 4], [1, 0],
                    [10, 2], [10, 4], [10, 0]])
          
      # GMM ๋ชจ๋ธ ์ƒ์„ฑ ๋ฐ ์ ํ•ฉ
      gm = GaussianMixture(n_components=2, random_state=0).fit(X)
          
      # ๊ฐ ํ˜ผํ•ฉ ์„ฑ๋ถ„์˜ ํ‰๊ท  ์ถœ๋ ฅ
      print(gm.means_)
          
      # ์˜ˆ์ธก ๋ผ๋ฒจ ์ถœ๋ ฅ
      print(gm.predict([[0, 0], [12, 3]]))
      

โœ๏ธ ์ด ์˜ˆ์ œ๋Š” GaussianMixture๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋‘ ๊ฐœ์˜ ๊ฐ€์šฐ์‹œ์•ˆ ์„ฑ๋ถ„์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๊ณ , ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. GMM์€ ๋ฐ์ดํ„ฐ์˜ ์ž ์žฌ์ ์ธ ๋ถ„ํฌ๋ฅผ ํ•™์Šตํ•˜๊ณ  ์ด๋ฅผ ํ†ตํ•ด ํด๋Ÿฌ์Šคํ„ฐ๋ง์ด๋‚˜ ์ด์ƒ์น˜ ํƒ์ง€๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐ ๋งค์šฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

PCA (Principal Component Analysis)

  • ๋งํฌ: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
  • ์ •์˜: PCA(Principal Component Analysis)๋Š” ๋ฐ์ดํ„ฐ์˜ ์ฃผ์„ฑ๋ถ„์„ ๋ถ„์„ํ•˜์—ฌ ์ฐจ์›์„ ์ถ•์†Œํ•˜๋Š” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ์˜ ์ฃผ์š” ๋ณ€๋™์„ฑ์„ ์บก์ฒ˜ํ•˜๊ณ , ์› ๋ฐ์ดํ„ฐ์™€ ์žฌ๊ตฌ์„ฑ๋œ ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ํ†ตํ•ด ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ ํ•ฉํ•œ ๋ฐ์ดํ„ฐ: ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ฃผ์š” ๋ณ€๋™์„ฑ์„ ํŒŒ์•…ํ•˜๊ณ  ์ฐจ์›์˜ ์ €์ฃผ๋ฅผ ์ค„์ด๊ณ ์ž ํ•  ๋•Œ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ฃผ๋กœ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”, ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ, ํŠน์ง• ์ถ”์ถœ ๋“ฑ์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • sklearn ํ•จ์ˆ˜: sklearn.decomposition.PCA

    • ํ•จ์ˆ˜ ์„ค๋ช…:

      • PCA๋Š” ํŠน์ด๊ฐ’ ๋ถ„ํ•ด(Singular Value Decomposition, SVD)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋‚ฎ์€ ์ฐจ์›์œผ๋กœ ํˆฌ์˜ํ•˜๋Š” ์„ ํ˜• ์ฐจ์› ์ถ•์†Œ ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋Š” ๊ฐ ํŠน์„ฑ์— ๋Œ€ํ•ด ์ค‘์‹ฌํ™”๋˜๋ฉฐ, SVD๋ฅผ ํ†ตํ•ด ์ฃผ์„ฑ๋ถ„์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
      • LAPACK ๊ตฌํ˜„์˜ ์ „์ฒด SVD ๋˜๋Š” ๋ฐ์ดํ„ฐ์˜ ํ˜•ํƒœ์™€ ์ถ”์ถœํ•  ์„ฑ๋ถ„์˜ ์ˆ˜์— ๋”ฐ๋ผ ํ™•๋ฅ ์  SVD๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    • ๋งค๊ฐœ๋ณ€์ˆ˜:

      • n_components (int, float or โ€˜mleโ€™, default=None): ์œ ์ง€ํ•  ์ฃผ์„ฑ๋ถ„์˜ ์ˆ˜์ž…๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ ๋ชจ๋“  ์„ฑ๋ถ„์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.
      • copy (bool, default=True): True๋กœ ์„ค์ •ํ•˜๋ฉด fit ์‹œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณต์‚ฌํ•˜๊ณ , False๋กœ ์„ค์ •ํ•˜๋ฉด ๋ฐ์ดํ„ฐ๋ฅผ ๋ฎ์–ด์”๋‹ˆ๋‹ค.
      • whiten (bool, default=False): True๋กœ ์„ค์ •ํ•˜๋ฉด ์ฃผ์„ฑ๋ถ„ ๋ฒกํ„ฐ๋ฅผ ๋‹จ์œ„ ๋ถ„์‚ฐ์„ ๊ฐ€์ง€๋„๋ก ์Šค์ผ€์ผ๋งํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์˜ˆ์ธก ์ •ํ™•๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
      • svd_solver (str, default=โ€™autoโ€™): SVD๋ฅผ ๊ณ„์‚ฐํ•  ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. โ€˜autoโ€™, โ€˜fullโ€™, โ€˜arpackโ€™, โ€˜randomizedโ€™ ์ค‘ ์„ ํƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
      • tol (float, default=0.0): โ€˜arpackโ€™ SVD ์†”๋ฒ„์˜ ํŠน์ด๊ฐ’์— ๋Œ€ํ•œ ํ—ˆ์šฉ ์˜ค์ฐจ์ž…๋‹ˆ๋‹ค.
      • iterated_power (int or โ€˜autoโ€™, default=โ€™autoโ€™): โ€˜randomizedโ€™ SVD ์†”๋ฒ„์—์„œ ํŒŒ์›Œ ๋ฐฉ๋ฒ•์˜ ๋ฐ˜๋ณต ํšŸ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • n_oversamples (int, default=10): โ€˜randomizedโ€™ SVD ์†”๋ฒ„์—์„œ ์ถ”๊ฐ€๋กœ ์ƒ˜ํ”Œ๋งํ•  ๋ฌด์ž‘์œ„ ๋ฒกํ„ฐ์˜ ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • power_iteration_normalizer (str, default=โ€™autoโ€™): โ€˜randomizedโ€™ SVD ์†”๋ฒ„์˜ ํŒŒ์›Œ ๋ฐ˜๋ณต ์ •๊ทœํ™” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
      • random_state (int, RandomState instance or None, default=None): ๋‚œ์ˆ˜ ์ƒ์„ฑ์„ ์ œ์–ดํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ์žฌํ˜„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
    • ์†์„ฑ:

      • components_: ์ฃผ์„ฑ๋ถ„ ๋ฒกํ„ฐ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
      • explained_variance_: ์„ ํƒ๋œ ๊ฐ ์„ฑ๋ถ„์— ์˜ํ•ด ์„ค๋ช…๋œ ๋ถ„์‚ฐ์˜ ์–‘์ž…๋‹ˆ๋‹ค.
      • explained_variance_ratio_: ์„ ํƒ๋œ ๊ฐ ์„ฑ๋ถ„์— ์˜ํ•ด ์„ค๋ช…๋œ ๋ถ„์‚ฐ์˜ ๋น„์œจ์ž…๋‹ˆ๋‹ค.
      • singular_values_: ์„ ํƒ๋œ ๊ฐ ์„ฑ๋ถ„์— ํ•ด๋‹นํ•˜๋Š” ํŠน์ด๊ฐ’์ž…๋‹ˆ๋‹ค.
      • mean_: ๊ฐ ํŠน์„ฑ์˜ ๊ฒฝํ—˜์  ํ‰๊ท ์ž…๋‹ˆ๋‹ค.
      • n_components_: ์ถ”์ •๋œ ์ฃผ์„ฑ๋ถ„์˜ ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • n_samples_: ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์˜ ์ƒ˜ํ”Œ ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • noise_variance_: ์ถ”์ •๋œ ๋…ธ์ด์ฆˆ ๊ณต๋ถ„์‚ฐ์ž…๋‹ˆ๋‹ค.
      • n_features_in_: ํ•™์Šต ์‹œ ๋ณธ ํŠน์„ฑ์˜ ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • feature_names_in_: ํ•™์Šต ์‹œ ๋ณธ ํŠน์„ฑ ์ด๋ฆ„ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
    • ๋ฉ”์„œ๋“œ:

      • fit(X[, y]): ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ X๋กœ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค.
      • fit_transform(X[, y]): ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ X๋กœ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๊ณ , ์ฐจ์› ์ถ•์†Œ๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
      • get_covariance(): ์ƒ์„ฑ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ ๊ณต๋ถ„์‚ฐ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
      • get_feature_names_out([input_features]): ๋ณ€ํ™˜๋œ ์ถœ๋ ฅ ํŠน์„ฑ ์ด๋ฆ„์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
      • get_metadata_routing(): ์ด ๊ฐ์ฒด์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋ผ์šฐํŒ…์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
      • get_params([deep]): ์ด ์ถ”์ •๊ธฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
      • get_precision(): ์ƒ์„ฑ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ ์ •๋ฐ€๋„ ํ–‰๋ ฌ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
      • inverse_transform(X): ๋ฐ์ดํ„ฐ๋ฅผ ์›๋ž˜์˜ ๊ณต๊ฐ„์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
      • score(X[, y]): ๋ชจ๋“  ์ƒ˜ํ”Œ์˜ ํ‰๊ท  ๋กœ๊ทธ ๊ฐ€๋Šฅ๋„๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
      • score_samples(X): ๊ฐ ์ƒ˜ํ”Œ์˜ ๋กœ๊ทธ ๊ฐ€๋Šฅ๋„๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
      • set_output(*[, transform]): ์ถœ๋ ฅ ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
      • set_params(**params): ์ด ์ถ”์ •๊ธฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
      • transform(X): ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ X์— ์ฐจ์› ์ถ•์†Œ๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      
      from sklearn.decomposition import PCA
      import numpy as np
          
      # ์˜ˆ์ œ ๋ฐ์ดํ„ฐ
      X = np.array([[-1, -1], [-2, -1], [-3, -2],
                    [1, 1], [2, 1], [3, 2]])
          
      # PCA ๋ชจ๋ธ ์ƒ์„ฑ ๋ฐ ์ ํ•ฉ
      pca = PCA(n_components=2)
      pca.fit(X)
          
      # ์„ค๋ช…๋œ ๋ถ„์‚ฐ ๋น„์œจ ์ถœ๋ ฅ
      print(pca.explained_variance_ratio_)
          
      # ํŠน์ด๊ฐ’ ์ถœ๋ ฅ
      print(pca.singular_values_)
          
      # ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ฃผ์„ฑ๋ถ„ ํˆฌ์˜
      X_new = pca.transform([[0, 0], [12, 3]])
      print(X_new)
      

โœ๏ธ ์ด ์˜ˆ์ œ๋Š” PCA๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ ์ฃผ์„ฑ๋ถ„์„ ์ถ”์ถœํ•˜๊ณ , ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋ฅผ ์ฃผ์„ฑ๋ถ„ ๊ณต๊ฐ„์— ํˆฌ์˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. PCA๋Š” ๋ฐ์ดํ„ฐ์˜ ์ฃผ์š” ๋ณ€๋™์„ฑ์„ ํŒŒ์•…ํ•˜๊ณ  ์ฐจ์›์„ ์ถ•์†Œํ•˜์—ฌ ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋ฐ ์‹œ๊ฐํ™”์—์„œ ์œ ์šฉํ•˜๊ฒŒ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.



-->