[์ดํƒ] ๊ฒฐ์ • ๊ฒฝ๊ณ„ ๊ธฐ๋ฐ˜ ์ด์ƒํƒ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜

Posted by Euisuk's Dev Log on May 14, 2024

[์ดํƒ] ๊ฒฐ์ • ๊ฒฝ๊ณ„ ๊ธฐ๋ฐ˜ ์ด์ƒํƒ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜

์›๋ณธ ๊ฒŒ์‹œ๊ธ€: https://velog.io/@euisuk-chung/๊ฒฐ์ •-๊ฒฝ๊ณ„-๊ธฐ๋ฐ˜-์ด์ƒํƒ์ง€-์•Œ๊ณ ๋ฆฌ์ฆ˜

๊ฒฐ์ • ๊ฒฝ๊ณ„ ๊ธฐ๋ฐ˜ ์ด์ƒํƒ์ง€

๊ฒฐ์ • ๊ฒฝ๊ณ„ ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ํ•™์Šตํ•˜์—ฌ ๋ช…ํ™•ํ•œ ๊ฒฝ๊ณ„๋ฅผ ํ˜•์„ฑํ•˜๊ณ , ์ด ๊ฒฝ๊ณ„๋ฅผ ๋ฒ—์–ด๋‚˜๋Š” ์ ๋“ค์„ ์ด์ƒ์น˜๋กœ ๊ฐ„์ฃผํ•ฉ๋‹ˆ๋‹ค.

OCSVM (One-Class SVM)

  • ๋งํฌ: https://scikit-learn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html
  • ์ •์˜: One-Class SVM(OCSVM)์€ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ํ•™์Šตํ•˜์—ฌ ๊ฒฐ์ • ๊ฒฝ๊ณ„๋ฅผ ํ˜•์„ฑํ•˜๊ณ , ์ด ๊ฒฝ๊ณ„๋ฅผ ๋ฒ—์–ด๋‚˜๋Š” ์ ๋“ค์„ ์ด์ƒ์น˜๋กœ ๊ฐ„์ฃผํ•˜๋Š” ๋น„์ง€๋„ ํ•™์Šต ๊ธฐ๋ฐ˜ ์ด์ƒ์น˜ ํƒ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.
  • ์ ํ•ฉํ•œ ๋ฐ์ดํ„ฐ: ์ฃผ๋กœ ์ •์ƒ ๋ฐ์ดํ„ฐ๋งŒ์œผ๋กœ ํ•™์Šตํ•˜๊ณ , ์ด์ƒ์น˜์˜ ๋น„์œจ์ด ๋งค์šฐ ๋‚ฎ๊ฑฐ๋‚˜ ์•Œ๋ ค์ง€์ง€ ์•Š์€ ๊ฒฝ์šฐ์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์˜ ๊ฒฝ๊ณ„๋ฅผ ์ฐพ๋Š” ๋ฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • sklearn ํ•จ์ˆ˜: sklearn.svm.OneClassSVM

    • ํ•จ์ˆ˜ ์„ค๋ช…:

      • OneClassSVM์€ ๊ณ ์ฐจ์› ๋ถ„ํฌ์˜ ์ง€์ง€๋ฅผ ์ถ”์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๋น„์ง€๋„ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ์ฃผ๋กœ ์ •์ƒ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šตํ•˜์—ฌ ๊ฒฝ๊ณ„๋ฅผ ํ˜•์„ฑํ•˜๊ณ , ์ด ๊ฒฝ๊ณ„๋ฅผ ๋ฒ—์–ด๋‚˜๋Š” ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋ฅผ ์ด์ƒ์น˜๋กœ ํŒ๋ณ„ํ•ฉ๋‹ˆ๋‹ค.
      • ์ด ๊ตฌํ˜„์€ libsvm์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋ฉฐ, ๋‹ค์–‘ํ•œ ์ปค๋„ ํ•จ์ˆ˜์™€ ํ•จ๊ป˜ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • ๋งค๊ฐœ๋ณ€์ˆ˜:

      • kernel (str, default=โ€™rbfโ€™): ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ ์‚ฌ์šฉํ•  ์ปค๋„ ํƒ€์ž…์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. โ€˜linearโ€™, โ€˜polyโ€™, โ€˜rbfโ€™, โ€˜sigmoidโ€™, โ€˜precomputedโ€™ ์ค‘ ์„ ํƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
      • degree (int, default=3): ๋‹คํ•ญ์‹ ์ปค๋„(โ€˜polyโ€™) ํ•จ์ˆ˜์˜ ์ฐจ์ˆ˜์ž…๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์ปค๋„์—๋Š” ๋ฌด์‹œ๋ฉ๋‹ˆ๋‹ค.
      • gamma (str or float, default=โ€™scaleโ€™): โ€˜rbfโ€™, โ€˜polyโ€™, โ€˜sigmoidโ€™ ์ปค๋„์˜ ๊ณ„์ˆ˜์ž…๋‹ˆ๋‹ค. โ€˜scaleโ€™์€ 1 / (n_features * X.var()), โ€˜autoโ€™๋Š” 1 / n_features๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
      • coef0 (float, default=0.0): โ€˜polyโ€™์™€ โ€˜sigmoidโ€™ ์ปค๋„ ํ•จ์ˆ˜์˜ ๋…๋ฆฝํ•ญ์ž…๋‹ˆ๋‹ค.
      • tol (float, default=1e-3): ์ •์ง€ ๊ธฐ์ค€์— ๋Œ€ํ•œ ํ—ˆ์šฉ ์˜ค์ฐจ์ž…๋‹ˆ๋‹ค.
      • nu (float, default=0.5): ํ›ˆ๋ จ ์˜ค๋ฅ˜์˜ ๋น„์œจ ์ƒํ•œ ๋ฐ ์„œํฌํŠธ ๋ฒกํ„ฐ์˜ ๋น„์œจ ํ•˜ํ•œ์ž…๋‹ˆ๋‹ค. (0, 1] ๊ตฌ๊ฐ„์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
      • shrinking (bool, default=True): ์ˆ˜์ถ• ํœด๋ฆฌ์Šคํ‹ฑ์„ ์‚ฌ์šฉํ• ์ง€ ์—ฌ๋ถ€์ž…๋‹ˆ๋‹ค.
      • cache_size (float, default=200): ์ปค๋„ ์บ์‹œ ํฌ๊ธฐ(MB)์ž…๋‹ˆ๋‹ค.
      • verbose (bool, default=False): ์ž์„ธํ•œ ์ถœ๋ ฅ์„ ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค.
      • max_iter (int, default=-1): ์†”๋ฒ„ ๋‚ด์—์„œ์˜ ์ตœ๋Œ€ ๋ฐ˜๋ณต ํšŸ์ˆ˜์ž…๋‹ˆ๋‹ค. -1์€ ์ œํ•œ์ด ์—†์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
    • ์†์„ฑ:

      • support_: ์„œํฌํŠธ ๋ฒกํ„ฐ์˜ ์ธ๋ฑ์Šค ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
      • support_vectors_: ์„œํฌํŠธ ๋ฒกํ„ฐ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
      • dual_coef_: ๊ฒฐ์ • ํ•จ์ˆ˜์—์„œ ์„œํฌํŠธ ๋ฒกํ„ฐ์˜ ๊ณ„์ˆ˜์ž…๋‹ˆ๋‹ค.
      • coef_: ์ปค๋„์ด โ€˜linearโ€™์ผ ๋•Œ ํŠน์„ฑ์— ํ• ๋‹น๋œ ๊ฐ€์ค‘์น˜ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
      • intercept_: ๊ฒฐ์ • ํ•จ์ˆ˜์˜ ์ƒ์ˆ˜ํ•ญ์ž…๋‹ˆ๋‹ค.
      • n_features_in_: fit ๋™์•ˆ ๋ณธ ํŠน์„ฑ ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • feature_names_in_: fit ๋™์•ˆ ๋ณธ ํŠน์„ฑ ์ด๋ฆ„ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.
      • n_iter_: ๋ชจ๋ธ์„ ๋งž์ถ”๊ธฐ ์œ„ํ•ด ์‹คํ–‰๋œ ๋ฐ˜๋ณต ํšŸ์ˆ˜์ž…๋‹ˆ๋‹ค.
      • offset_: raw ์ ์ˆ˜์—์„œ ๊ฒฐ์ • ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ์˜คํ”„์…‹์ž…๋‹ˆ๋‹ค.
    • ๋ฉ”์„œ๋“œ:

      • decision_function(X): ๋ถ„๋ฆฌ ์ดˆํ‰๋ฉด๊นŒ์ง€์˜ ์„œ๋ช…๋œ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
      • fit(X[, y, sample_weight]): ์ƒ˜ํ”Œ ์ง‘ํ•ฉ X์˜ ์†Œํ”„ํŠธ ๊ฒฝ๊ณ„๋ฅผ ๊ฐ์ง€ํ•ฉ๋‹ˆ๋‹ค.
      • fit_predict(X[, y]): ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜๊ณ  ํ•ด๋‹น ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋ ˆ์ด๋ธ”์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
      • get_metadata_routing(): ์ด ๊ฐ์ฒด์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋ผ์šฐํŒ…์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
      • get_params([deep]): ์ด ์ถ”์ •๊ธฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
      • predict(X): ์ฃผ์–ด์ง„ ์ƒ˜ํ”Œ์— ๋Œ€ํ•œ ๋ถ„๋ฅ˜๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
      • score_samples(X): ์ƒ˜ํ”Œ์˜ ์›์‹œ ์ ์ˆ˜ ํ•จ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
      • set_fit_request(*[, sample_weight]): fit ๋ฉ”์„œ๋“œ์— ์ „๋‹ฌ๋œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์š”์ฒญ์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
      • set_params(**params): ์ด ์ถ”์ •๊ธฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      
      from sklearn.svm import OneClassSVM
      import numpy as np
          
      # ์˜ˆ์ œ ๋ฐ์ดํ„ฐ
      X = [[0], [0.44], [0.45], [0.46], [1]]
          
      # OCSVM ๋ชจ๋ธ ์ƒ์„ฑ ๋ฐ ์ ํ•ฉ
      clf = OneClassSVM(gamma='auto').fit(X)
          
      # ์˜ˆ์ธก ๋ผ๋ฒจ ์ถœ๋ ฅ
      print(clf.predict(X))
          
      # ๊ฐ ์ƒ˜ํ”Œ์˜ ์ ์ˆ˜ ์ถœ๋ ฅ
      print(clf.score_samples(X))
      

โœ๏ธ ์ด ์˜ˆ์ œ๋Š” OneClassSVM์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ„๋‹จํ•œ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ์ด์ƒ์น˜ ํƒ์ง€๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. OCSVM์€ ์ฃผ๋กœ ์ •์ƒ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜์—ฌ ๊ฒฝ๊ณ„๋ฅผ ํ˜•์„ฑํ•˜๊ณ , ์ด ๊ฒฝ๊ณ„๋ฅผ ๋ฒ—์–ด๋‚˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ด์ƒ์น˜๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฐ ํšจ๊ณผ์ ์ž…๋‹ˆ๋‹ค.



-->