[Paper Review] EXAONE Path 2.0: Pathology Foundation Model with End-to-End Supervision

Posted by Euisuk's Dev Log on August 30, 2025

[Paper Review] EXAONE Path 2.0: Pathology Foundation Model with End-to-End Supervision

์›๋ณธ ๊ฒŒ์‹œ๊ธ€: https://velog.io/@euisuk-chung/Paper-Review-EXAONE-Path-2.0-Pathology-Foundation-Model-with-End-to-End-Supervision

https://arxiv.org/abs/2507.06639

1
PYEON, Myeongjang, et al. EXAONE Path 2.0: Pathology Foundation Model with End-to-End Supervision. arXiv preprint arXiv:2507.06639, 2025.

Abstract

๋””์ง€ํ„ธ ๋ณ‘๋ฆฌํ•™์—์„œ whole-slide images (WSIs)๋Š” gigapixel ๊ทœ๋ชจ๋กœ ์ธํ•ด ์ฒ˜๋ฆฌ๊ฐ€ ์–ด๋ ค์šด ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋Œ€๋ถ€๋ถ„์˜ ์ ‘๊ทผ๋ฒ•์€ self-supervised learning (SSL)์„ ํ†ตํ•ด patch encoder๋ฅผ ํ›ˆ๋ จ์‹œํ‚จ ๋‹ค์Œ, multiple instance learning (MIL) ๋˜๋Š” slide encoder๋ฅผ ํ†ตํ•ด patch-level embedding์„ ์ง‘๊ณ„ํ•˜์—ฌ downstream ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ patch-level SSL์€ mutation status ๋ฐ ๋ถ„์ž ํŠน์„ฑ๊ณผ ๊ฐ™์€ biomarker ์˜ˆ์ธก์— ํ•„์ˆ˜์ ์ธ ๋ณต์žกํ•œ ๋„๋ฉ”์ธ ํŠนํ™” ํŠน์„ฑ์„ ๊ฐ„๊ณผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. SSL ๋ฐฉ๋ฒ•๋“ค์€ ์ž‘์€ patch-level ์˜์—ญ์—์„œ ์ž์—ฐ ์ด๋ฏธ์ง€ ๋„๋ฉ”์ธ์„ ์œ„ํ•ด ์„ ํƒ๋œ ๊ธฐ๋ณธ augmentation์—๋งŒ ์˜์กดํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ SSL ๋ฐฉ๋ฒ•๋“ค์€ ์™„์ „ ์ง€๋„ํ•™์Šต ์ ‘๊ทผ๋ฒ•๋ณด๋‹ค ๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ์ด ๋–จ์–ด์ง€๋ฉฐ, ๊ฒฝ์Ÿ๋ ฅ ์žˆ๋Š” ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ๊ด‘๋ฒ”์œ„ํ•œ ๊ณ„์‚ฐ ์ž์›๊ณผ ๋ฐ์ดํ„ฐ์…‹์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” ์ง์ ‘์ ์ธ slide-level ์ง€๋„ํ•™์Šต ํ•˜์—์„œ patch-level representation์„ ํ•™์Šตํ•˜๋Š” ๋ณ‘๋ฆฌํ•™ foundation model์ธ EXAONE Path 2.0์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๋‹จ 37k๊ฐœ์˜ WSI๋งŒ์„ ํ›ˆ๋ จ์— ์‚ฌ์šฉํ•˜์—ฌ, EXAONE Path 2.0์€ 10๊ฐœ์˜ biomarker ์˜ˆ์ธก ์ž‘์—…์—์„œ ์ตœ์ฒจ๋‹จ ํ‰๊ท  ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๋ฉฐ, ๋›ฐ์–ด๋‚œ ๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

Figure 1: ๋งค๊ฐœ๋ณ€์ˆ˜ ์ˆ˜์™€ ํ›ˆ๋ จ์— ์‚ฌ์šฉ๋œ WSI ์ˆ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๋ชจ๋ธ ์„ฑ๋Šฅ ๋น„๊ต. ํ‰๊ท  AUROC๋Š” 10๊ฐœ์˜ biomarker ์˜ˆ์ธก ์ž‘์—…์—์„œ AUROC ์ ์ˆ˜๋ฅผ ํ‰๊ท ํ•˜์—ฌ ์–ป์–ด์ง‘๋‹ˆ๋‹ค. ์ฃผ๋ชฉํ•  ์ ์€ EXAONE Path 2.0์ด ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค์— ๋น„ํ•ด ์ ์€ ๋งค๊ฐœ๋ณ€์ˆ˜์™€ ์ ์€ WSI๋ฅผ ์‚ฌ์šฉํ–ˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๋†’์€ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜์—ฌ ํšจ์œจ์„ฑ์„ ๋ณด์—ฌ์ค€๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  1. Introduction

๋””์ง€ํ„ธ ๋ณ‘๋ฆฌํ•™์€ AI ๊ธฐ๋ฐ˜ ์˜๋ฃŒ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ํ•ต์‹ฌ ๋„๋ฉ”์ธ์œผ๋กœ ๋ถ€์ƒํ•˜์˜€์œผ๋ฉฐ, whole-slide images (WSIs)๋Š” gigapixel ๊ทœ๋ชจ๋กœ ์ธํ•ด ๋…ํŠนํ•œ ๊ณ„์‚ฐ์  ๊ณผ์ œ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ํ˜„์žฌ์˜ ์ ‘๊ทผ๋ฒ•๋“ค์€ ์ผ๋ฐ˜์ ์œผ๋กœ 2๋‹จ๊ณ„ ํŒจ๋Ÿฌ๋‹ค์ž„์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค: DINO์™€ DINOv2์™€ ๊ฐ™์€ self-supervised learning ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด patch-level encoder๋ฅผ ํ›ˆ๋ จ์‹œํ‚จ ๋‹ค์Œ, downstream ์˜ˆ์ธก ์ž‘์—…์„ ์œ„ํ•ด multiple-instance learning (MIL) ๋˜๋Š” slide-level encoder๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ patch-level embedding์„ ์ง‘๊ณ„ํ•ฉ๋‹ˆ๋‹ค.

์ด ํŒจ๋Ÿฌ๋‹ค์ž„์€ ์œ ๋งํ•จ์„ ๋ณด์˜€์ง€๋งŒ, ๋””์ง€ํ„ธ ๋ณ‘๋ฆฌํ•™ ๋ถ„์•ผ์—์„œ ๊ทผ๋ณธ์ ์ธ ํ•œ๊ณ„๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Self-supervised patch-level pretraining์€ mutation status ๋˜๋Š” ๊ธฐํƒ€ ๋ถ„์ž ํŠน์„ฑ๊ณผ ๊ฐ™์€ biomarker ์˜ˆ์ธก์— ํ•„์ˆ˜์ ์ธ ๋ณต์žกํ•œ ๋„๋ฉ”์ธ ํŠนํ™” ํŠน์„ฑ์„ ํฌ์ฐฉํ•œ๋‹ค๊ณ  ๋ณด์žฅํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. Self-supervised learning (SSL) ๋ฐฉ๋ฒ•๋“ค์ด ์ž‘์€ patch-level ์˜์—ญ์—์„œ ์ž์—ฐ ์ด๋ฏธ์ง€ ๋„๋ฉ”์ธ์„ ์œ„ํ•ด ์„ ํƒ๋œ ๊ธฐ๋ณธ augmentation์—๋งŒ ์˜์กดํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ์ด๋Ÿฌํ•œ ์ ‘๊ทผ๋ฒ•๋“ค์€ ์™„์ „ ์ง€๋„ํ•™์Šต ๋ฐฉ๋ฒ•์— ๋น„ํ•ด ๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ์ด ๋–จ์–ด์ง€๋ฉฐ, ๊ฒฝ์Ÿ๋ ฅ ์žˆ๋Š” ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ๊ด‘๋ฒ”์œ„ํ•œ ๊ณ„์‚ฐ ์ž์›๊ณผ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” ์ง์ ‘์ ์ธ slide-level ์ง€๋„ํ•™์Šต ํ•˜์—์„œ patch-level representation์„ ํ•™์Šตํ•˜๋Š” ๋ณ‘๋ฆฌํ•™ foundation model์ธ EXAONE Path 2.0์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ์ ‘๊ทผ๋ฒ•์€ patch encoder ํ›ˆ๋ จ ๋™์•ˆ ์—ฌ๋Ÿฌ slide-level label์„ ํ†ตํ•ฉํ•จ์œผ๋กœ์จ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค๊ณผ ๊ทผ๋ณธ์ ์œผ๋กœ ๋‹ค๋ฅด๋ฉฐ, ๋ชจ๋ธ์ด ์ž„์ƒ์ ์œผ๋กœ ๊ด€๋ จ๋œ ํŠน์„ฑ์„ ๋” ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

์šฐ๋ฆฌ์˜ ๊ฒฐ๊ณผ๋Š” EXAONE Path 2.0์ด ๊ฒฝ์Ÿํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค๋ณด๋‹ค ์‹ค์งˆ์ ์œผ๋กœ ์ ์€ ํ›ˆ๋ จ ์ƒ˜ํ”Œ์„ ์š”๊ตฌํ•˜๋ฉด์„œ๋„ ๋ชจ๋“  ํ‰๊ฐ€๋œ ์ž‘์—…์—์„œ ์šฐ์ˆ˜ํ•œ ํ‰๊ท  ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•จ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ๊ณ„์‚ฐ ๋ณ‘๋ฆฌํ•™์—์„œ ์ค‘์š”ํ•œ ๋ฐœ์ „์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

  1. Modeling

2.1 Gigapixel ์ด๋ฏธ์ง€ ํ›ˆ๋ จ์˜ ๊ธˆ์ง€์  ๊ณ„์‚ฐ ๋น„์šฉ ๊ทน๋ณต

Gigapixel whole-slide image์— ๋Œ€ํ•œ ํ›ˆ๋ จ์€ ๋ฉ”๋ชจ๋ฆฌ ์ œ์•ฝ๊ณผ ์ฒ˜๋ฆฌ ์š”๊ตฌ์‚ฌํ•ญ์œผ๋กœ ์ธํ•ด ์ƒ๋‹นํ•œ ๊ณ„์‚ฐ์  ๊ณผ์ œ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” hierarchical architecture ์„ค๊ณ„, curriculum learning, ๊ทธ๋ฆฌ๊ณ  ํšจ์œจ์ ์ธ ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ ๊ธฐ๋ฒ•์˜ ์กฐํ•ฉ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

Architecture ์„ค๊ณ„: ์šฐ๋ฆฌ๋Š” 3๋‹จ๊ณ„ Hierarchical Image Pyramid Transformer (HIPT) ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ฑ„ํƒํ•ฉ๋‹ˆ๋‹ค. ์ด hierarchical ์„ค๊ณ„๋Š” ์ „์ฒด ํ•ด์ƒ๋„์—์„œ gigapixel ์ด๋ฏธ์ง€๋ฅผ ์ง์ ‘ ์ฒ˜๋ฆฌํ•˜๋Š” ๋Œ€์‹  ์ ์ง„์ ์œผ๋กœ ๋” ๋†’์€ ์ถ”์ƒํ™” ์ˆ˜์ค€์—์„œ patch๋ฅผ ์ฒ˜๋ฆฌํ•จ์œผ๋กœ์จ ๊ณ„์‚ฐ ๋ณต์žก์„ฑ์„ ์ค„์—ฌ ๋Œ€๊ทœ๋ชจ WSI์˜ ๋” ํšจ์œจ์ ์ธ ์ฒ˜๋ฆฌ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„ ViT๋Š” ๊ฐœ๋ณ„ patch๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ , ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„ ViT๋Š” patch-level ํŠน์„ฑ์„ region-level representation์œผ๋กœ ์ง‘๊ณ„ํ•˜๋ฉฐ, ์„ธ ๋ฒˆ์งธ ๋‹จ๊ณ„ ViT๋Š” ๋ชจ๋“  region-level ํŠน์„ฑ์„ ํ†ตํ•ฉํ•˜์—ฌ ์ „์ฒด slide๋ฅผ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

Curriculum Learning: ๋ชจ๋“  ๋‹จ๊ณ„์—์„œ ๋™์‹œ์— end-to-end ํ›ˆ๋ จ์˜ ๊ณ„์‚ฐ ๋ถ€๋‹ด์„ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” ์ ์ง„์  ํ•ด์ƒ๋„ ์Šค์ผ€์ผ๋ง์„ ํฌํ•จํ•œ 2๋‹จ๊ณ„ curriculum learning ์ ‘๊ทผ๋ฒ•์„ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ curriculum ๋‹จ๊ณ„์—์„œ๋Š” ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„ ViT์— 256ร—256 DINO loss๋ฅผ, ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„ ViT์— 1024ร—1024 DINO loss๋ฅผ ์ ์šฉํ•˜์—ฌ ์ „์ฒด 3๋‹จ๊ณ„ end-to-end ๊ณ„์‚ฐ์„ ์š”๊ตฌํ•˜์ง€ ์•Š๊ณ  hierarchical visual representation์„ ๊ตฌ์ถ•ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ curriculum ๋‹จ๊ณ„์—์„œ๋Š” ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„ ViT์— 256ร—256 DINO loss๋ฅผ ๊ณ„์† ์ ์šฉํ•˜๋ฉด์„œ ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„ ViT์˜ ๊ฒฝ์šฐ 4096ร—4096 region์œผ๋กœ ํ™•์žฅํ•˜๊ณ , slide-level supervised cross-entropy loss๋ฅผ ๋„์ž…ํ•˜์—ฌ ์ „์ฒด slide๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ์ „์ฒด 3๋‹จ๊ณ„ ๋ชจ๋ธ์— gradient๋ฅผ ์ „ํŒŒํ•ฉ๋‹ˆ๋‹ค. ์ด curriculum ์ ‘๊ทผ๋ฒ•์€ ๋ชจ๋“  ํ›ˆ๋ จ ๋ฐ˜๋ณต์—์„œ ์ตœ๋Œ€ ํ•ด์ƒ๋„๋กœ ๋ชจ๋“  ๋‹จ๊ณ„๋ฅผ ์ฒ˜๋ฆฌํ•  ํ•„์š”๋ฅผ ํ”ผํ•จ์œผ๋กœ์จ ๊ณ„์‚ฐ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ํฌ๊ฒŒ ์ค„์ž…๋‹ˆ๋‹ค.

๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ: ์ „์ฒด WSI ์ฒ˜๋ฆฌ์˜ ๊ณ„์‚ฐ ์š”๊ตฌ๋ฅผ ๋”์šฑ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” activation checkpointing๊ณผ CPU offloading ์ „๋žต์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  patch embedding์„ GPU ๋ฉ”๋ชจ๋ฆฌ์— ํ•œ ๋ฒˆ์— ๋กœ๋“œํ•˜๋Š” ๋Œ€์‹ , supervised loss ๊ณ„์‚ฐ ์ค‘์— ํ•„์š”์— ๋”ฐ๋ผ activation์„ ๋™์ ์œผ๋กœ ๊ณ„์‚ฐํ•˜๊ณ  ์ „์†กํ•ฉ๋‹ˆ๋‹ค. ์ด ์ ‘๊ทผ๋ฒ•์€ ํ›ˆ๋ จ ํšจ์œจ์„ฑ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๋ฉ”๋ชจ๋ฆฌ ์š”๊ตฌ์‚ฌํ•ญ์„ ํฌ๊ฒŒ ์ค„์—ฌ ์ œํ•œ๋œ ๊ณ„์‚ฐ ์ž์›์œผ๋กœ gigapixel ์ด๋ฏธ์ง€๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

2.2 ์—ฌ๋Ÿฌ Biomarker ์˜ˆ์ธก ์ž‘์—…์—์„œ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅํ•œ Representation ํ•™์Šต

๊ณ„์‚ฐ ํšจ์œจ์„ฑ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๋‹ค์–‘ํ•œ biomarker ์˜ˆ์ธก ์ž‘์—…์— ๊ฑธ์ณ ์ผ๋ฐ˜ํ™”๋˜๋Š” representation์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” downstream ์ž‘์—… ์ ์‘์„ ์œ„ํ•œ early exit ์ „๋žต๊ณผ ๊ฒฐํ•ฉ๋œ multi-task learning ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

Multi-Task Learning ํ”„๋ ˆ์ž„์›Œํฌ: ์šฐ๋ฆฌ๋Š” ์—ฌ๋Ÿฌ ์ƒํ˜ธ ๋ณด์™„์  ๋ชฉํ‘œ์— ๊ฑธ์ณ ๊ณต๋™์œผ๋กœ ์ตœ์ ํ™”ํ•˜๋Š” multi-task learning ์ ‘๊ทผ๋ฒ•์„ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ํ›ˆ๋ จ์€ ์„ธ ๊ฐ€์ง€ ์ฃผ์š” ์ž‘์—… ๋ฒ”์ฃผ๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค: (1) 33๊ฐœ ์•” ์œ ํ˜•์— ๊ฑธ์นœ ์•” ์•„ํ˜• ๋ถ„๋ฅ˜, (2) 12๊ฐœ ์žฅ๊ธฐ ์‹œ์Šคํ…œ์— ๊ฑธ์นœ ์กฐ์ง ์œ ํ˜• ๋ถ„๋ฅ˜, (3) pan-cancer ๋ฐ ์•” ํŠนํ™” mutation status, microsatellite instability, hormone receptor ์•„ํ˜• ๋ถ„๋ฅ˜๋ฅผ ํฌํ•จํ•œ ๋ถ„์ž biomarker ์˜ˆ์ธก. ์ด multi-task learning ์ „๋žต์€ ์ด๋Ÿฌํ•œ ๋‹ค์–‘ํ•œ ์˜ˆ์ธก ๋ชฉํ‘œ์— ๋Œ€ํ•ด ๊ณต๋™์œผ๋กœ ์ตœ์ ํ™”ํ•˜์—ฌ ๋ชจ๋ธ์ด ์ƒ๋ฌผํ•™์  ์กฐ์ง์˜ ๋‹ค์–‘ํ•œ ๊ทœ๋ชจ์—์„œ ๊ทผ๋ณธ์ ์ธ ๋ณ‘๋ฆฌํ•™์  ํŒจํ„ด์„ ํฌ์ฐฉํ•˜๋Š” ๊ณต์œ ๋œ representation์„ ํ•™์Šตํ•˜๋„๋ก ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค. ๊ณต๋™ ์ตœ์ ํ™”๋Š” ๊ฐœ๋ณ„ ์ž‘์—…์— ๋Œ€ํ•œ ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๋ฉด์„œ ์ „์ฒด downstream ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ŠคํŽ™ํŠธ๋Ÿผ์— ๊ฑธ์นœ ์ผ๋ฐ˜ํ™”๋ฅผ ๊ฐœ์„ ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

Downstream ์ ์‘์„ ์œ„ํ•œ Early Exit ์ „๋žต: ์†Œ๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์™€ ๊นŠ์€ ๋„คํŠธ์›Œํฌ ํ™˜๊ฒฝ์—์„œ ๊ณผ์ ํ•ฉ์„ ๋”์šฑ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” ์ „์ฒด hierarchical ๋ชจ๋ธ๋ณด๋‹ค๋Š” early representation์„ ํ™œ์šฉํ•˜๋Š” shallow network ์ ‘๊ทผ๋ฒ•์„ ์ฑ„ํƒํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ์šฐ๋ฆฌ๋Š” downstream ์ž‘์—… ์ ์‘์„ ์œ„ํ•ด Clustering-constrained Attention Multiple Instance Learning (CLAM)๊ณผ ๊ฒฐํ•ฉ๋œ ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„ ๋ชจ๋ธ์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ „์ฒด hierarchical network๋ฅผ fine-tuningํ•˜๋Š” ๋Œ€์‹ , ์ด early exit ์ ‘๊ทผ๋ฒ•์€ ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„ ๋ชจ๋ธ์˜ robustํ•œ patch-level ํŠน์„ฑ์„ ์‚ฌ์šฉํ•˜๋ฉด์„œ CLAM์ด ์ด๋Ÿฌํ•œ ํŠน์„ฑ์„ slide-level ์˜ˆ์ธก์„ ์œ„ํ•ด ํšจ์œจ์ ์œผ๋กœ ์ง‘๊ณ„ํ•ฉ๋‹ˆ๋‹ค. ์ด ์ „๋žต์€ downstream ์ž‘์—… ์ ์‘ ์ค‘ ๊ณ„์‚ฐ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ํฌ๊ฒŒ ์ค„์ด๋ฉด์„œ ์ œํ•œ๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š” ๋ณ‘๋ฆฌํ•™ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ์ผ๋ฐ˜์ ์œผ๋กœ ๊ด€์ฐฐ๋˜๋Š” ๊ณผ์ ํ•ฉ์˜ ํ•จ์ •์„ ํ”ผํ•ฉ๋‹ˆ๋‹ค.

  1. Experiments

3.1 Training Data

EXAONE Path 2.0์€ 37,195๊ฐœ์˜ Formalin-Fixed, Paraffin-Embedded (FFPE) Hematoxylin and Eosin (H&E) ์—ผ์ƒ‰ WSI์—์„œ ํ›ˆ๋ จ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ WSI๋Š” 16๊ฐœ ํ›ˆ๋ จ ์ž‘์—…์— ๊ฑธ์ณ 144,450๊ฐœ์˜ ์ด๋ฏธ์ง€-๋ผ๋ฒจ ์Œ์„ ์ƒ์„ฑํ•˜๋ฉฐ, ๊ฐ WSI๋Š” ์•” ์•„ํ˜• ๋ถ„๋ฅ˜, ์กฐ์ง ๋ถ„๋ฅ˜, biomarker ์˜ˆ์ธก์„ ํฌํ•จํ•œ ๋‹ค์–‘ํ•œ ์˜ˆ์ธก ๋ชฉํ‘œ์— ํ•ด๋‹นํ•˜๋Š” ์—ฌ๋Ÿฌ ๋ผ๋ฒจ์— ๊ธฐ์—ฌํ•ฉ๋‹ˆ๋‹ค.

3.2 Baselines

์šฐ๋ฆฌ๋Š” slide-level ๋ถ„๋ฅ˜์— ๋Œ€ํ•œ slide-level ๋ฐ patch-level ์ ‘๊ทผ๋ฒ•์„ ๋ชจ๋‘ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ foundation model ์„ธํŠธ๋ฅผ baseline์œผ๋กœ ์„ ํƒํ–ˆ์Šต๋‹ˆ๋‹ค. Slide-level ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ, downstream ์ž‘์—…์— ์ง์ ‘ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” slide-level representation์„ ์ƒ์„ฑํ•˜๋Š” TITAN, PRISM, CHIEF, Prov-GigaPath๋ฅผ ํฌํ•จํ–ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ patch-level foundation model baseline์œผ๋กœ EXAONE Path 1.0๊ณผ UNI2-h๋ฅผ ํฌํ•จํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ชจ๋ธ๋“ค์ด slide์˜ ๊ตญ์†Œ์  ์˜์—ญ์—์„œ ์ž‘๋™ํ•˜์ง€๋งŒ, ์ ์ ˆํ•œ ์ง‘๊ณ„ ์ „๋žต๊ณผ ๊ฒฐํ•ฉํ•  ๋•Œ ๊ทธ๋“ค์˜ ์„ค๊ณ„์™€ ์ด์ „ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์€ slide-level ์˜ˆ์ธก ์ž‘์—…๊ณผ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ผ์น˜ํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ์‹คํ—˜์—์„œ๋Š” slide-level ์˜ˆ์ธก์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด patch-level ํŠน์„ฑ์— CLAM ๊ธฐ๋ฐ˜ ์ง‘๊ณ„๊ธฐ๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

3.3 Evaluation Protocols

๊ฐ ๋ชจ๋ธ์€ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ foundation model ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ณ ์ •ํ•œ ์ฑ„ ์•„ํ‚คํ…์ฒ˜ ์„ค๊ณ„์— ๋”ฐ๋ผ slide-level ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•ด fine-tuning๋˜์—ˆ์Šต๋‹ˆ๋‹ค. Slide-level foundation model์˜ ๊ฒฝ์šฐ, ๊ณ ์ •๋œ backbone์— ์˜ํ•ด ์ƒ์„ฑ๋œ slide-level representation ์œ„์— ์„ ํ˜• ๋ถ„๋ฅ˜ ๋ ˆ์ด์–ด๋ฅผ ํ›ˆ๋ จํ–ˆ์Šต๋‹ˆ๋‹ค. Patch-level foundation model์˜ ๊ฒฝ์šฐ, UNI์—์„œ ์ œ์•ˆ๋œ ์ ‘๊ทผ๋ฒ•์„ ์ฑ„ํƒํ•˜์—ฌ patch-level ํŠน์„ฑ์— CLAM ์ง‘๊ณ„๊ธฐ๋ฅผ ์ ์šฉํ•˜์—ฌ slide-level ์˜ˆ์ธก์„ ์ƒ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ์ œ์•ˆํ•œ ๋ชจ๋ธ๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„ ๋ชจ๋ธ์—์„œ ์ถ”์ถœ๋œ patch-level ํŠน์„ฑ์„ ํ™œ์šฉํ•˜๋ฉฐ, ์ดํ›„ slide-level ์ถ”๋ก ์„ ์œ„ํ•ด CLAM์„ ํ†ตํ•ด ์ง‘๊ณ„๋ฉ๋‹ˆ๋‹ค. ๊ฐ ๋ฒค์น˜๋งˆํฌ ์ž‘์—…์€ ์‚ฌ์ „ ์ •์˜๋œ ํ›ˆ๋ จ/ํ…Œ์ŠคํŠธ ๋ถ„ํ• ์—์„œ ํ‰๊ฐ€๋˜์—ˆ์œผ๋ฉฐ, ๋‹ค์–‘ํ•œ ๋ฌด์ž‘์œ„ ์‹œ๋“œ๋ฅผ ๊ฐ€์ง„ 4๋ฒˆ์˜ ๋…๋ฆฝ์ ์ธ ํ›ˆ๋ จ ์‹คํ–‰์— ๋Œ€ํ•œ ํ‰๊ท  ์„ฑ๋Šฅ์„ ๋ณด๊ณ ํ•ฉ๋‹ˆ๋‹ค.

3.4 Slide-Level Benchmarks

๋ชจ๋ธ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด, ํ์„ ์•”์ข…, ์œ ๋ฐฉ์•”, ๊ฒฐ์žฅ์ง์žฅ์•”, ์‹ ์žฅ์•”์„ ํฌํ•จํ•œ ๋‹ค์–‘ํ•œ ์•” ๋ณ‘๋ณ€์—์„œ ํŒŒ์ƒ๋œ ์ด 10๊ฐœ์˜ slide-level ๋ฒค์น˜๋งˆํฌ ์ž‘์—…์„ ๊ตฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฒค์น˜๋งˆํฌ๋Š” ๊ฐœ์ธ ๋ฐ์ดํ„ฐ์…‹์—์„œ 4๊ฐœ ์ž‘์—…๊ณผ ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ์…‹์—์„œ 6๊ฐœ ์ž‘์—…์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์ž‘์—… ๋‹ค์–‘์„ฑ๊ณผ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์†Œ์Šค ๋ฐ ๊ธฐ๊ด€์—์„œ์˜ ๋ชจ๋ธ ์ผ๋ฐ˜ํ™”๋ฅผ ๋ชจ๋‘ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ์‹ ์ค‘ํ•˜๊ฒŒ ์„ ํƒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

3.4.1 ๊ฐœ์ธ ๋ฐ์ดํ„ฐ์…‹์˜ ๋ฒค์น˜๋งˆํฌ

์ด๋Ÿฌํ•œ ๋ฒค์น˜๋งˆํฌ๋Š” ํ•œ๊ตญ์˜ ํ•œ ์ข…ํ•ฉ๋ณ‘์›(KOR)๊ณผ ๋ฏธ๊ตญ์˜ ๋‘ ์ข…ํ•ฉ๋ณ‘์›(USA1, USA2)๊ณผ์˜ ํ˜‘๋ ฅ์œผ๋กœ ์ˆ˜์ง‘๋œ ๋‚ด๋ถ€ ๋ฐ์ดํ„ฐ์…‹์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ์€ ์—ฐ๊ตฌ ๋ชฉ์ ์œผ๋กœ ํ•ด๋‹น ๊ธฐ๊ด€์œค๋ฆฌ์œ„์›ํšŒ(IRB)์˜ ์Šน์ธ์„ ๋ฐ›์•˜์Šต๋‹ˆ๋‹ค. ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋Š” ์ต๋ช…ํ™”๋˜์–ด ๋‚ด๋ถ€ ์‚ฌ์šฉ๋งŒ์„ ์œ„ํ•ด ์ž ๊ฒจ ์žˆ์œผ๋ฉฐ, ๋‚ด๋ถ€ ์„ฑ๋Šฅ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด์„œ๋งŒ ์—„๊ฒฉํ•˜๊ฒŒ ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

LUAD-TMB: ์ด ์ž‘์—…์€ ํ์„ ์•”์ข… WSI์—์„œ tumor mutation burden (TMB) ์ƒํƒœ(high vs. low)๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. TMB๋Š” DNA ์‹œํ€€์‹ฑ์—์„œ ๋ฉ”๊ฐ€๋ฒ ์ด์Šค๋‹น mutation ์ˆ˜๋กœ ์ •์˜๋˜๋ฉฐ, high์™€ low๋ฅผ ๊ตฌ๋ถ„ํ•˜๊ธฐ ์œ„ํ•ด 10์˜ ์ž„๊ณ„๊ฐ’์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์€ KOR-LUAD (low:high = 1063:287)์—์„œ ํ›ˆ๋ จ๋˜๊ณ , USA1-LUAD (137:117) ๋ฐ์ดํ„ฐ์…‹์—์„œ ํ…Œ์ŠคํŠธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

LUAD-EGFR: ์ด ์ž‘์—…์€ ํ์„ ์•”์ข…์—์„œ EGFR mutation์˜ ์กด์žฌ๋ฅผ ๊ฐ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ž„์ƒ์ ์œผ๋กœ 2์ฐจ ์ด์ƒ์˜ mutation์€ โ€œmutatedโ€๋กœ ๋ผ๋ฒจ๋ง๋˜๊ณ , ๋‹ค๋ฅธ ๋ชจ๋“  ๊ฒƒ์€ โ€œwild typeโ€์œผ๋กœ ๋ผ๋ฒจ๋ง๋ฉ๋‹ˆ๋‹ค. ํ›ˆ๋ จ์€ KOR-LUAD (wild:mut = 1145:205)๋ฅผ ์‚ฌ์šฉํ–ˆ์œผ๋ฉฐ, USA1-LUAD (242:12)์—์„œ ํ…Œ์ŠคํŠธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

LUAD-KRAS: ์ด ์ž‘์—…์€ EGFR์™€ ๋™์ผํ•œ ์ž„์ƒ mutation ๊ธฐ์ค€์„ ์‚ฌ์šฉํ•˜์—ฌ ํ์„ ์•”์ข… WSI์—์„œ KRAS mutation์„ ์‹๋ณ„ํ•ฉ๋‹ˆ๋‹ค. ํ›ˆ๋ จ์€ KOR1-LUAD (wild:mut = 1217:133)๋ฅผ ์‚ฌ์šฉํ–ˆ์œผ๋ฉฐ, USA2-LUAD (347:168)์—์„œ ํ…Œ์ŠคํŠธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

CRC-MSI: ์ด ์ž‘์—…์€ ๊ฒฐ์žฅ์ง์žฅ์•”์„ ์ข…์—์„œ microsatellite instability (MSI) ์ƒํƒœ๋ฅผ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์€ KOR-CRC (stable:instable = 2630:831)์—์„œ ํ›ˆ๋ จ๋˜๊ณ  ๋™์ผํ•œ ๋ฐ์ดํ„ฐ์…‹์˜ ๋ณ„๋„ ๋ถ€๋ถ„(658:209)์—์„œ ํ…Œ์ŠคํŠธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

3.4.2 ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ์…‹์˜ ๋ฒค์น˜๋งˆํฌ

์ด๋Ÿฌํ•œ ๋ฒค์น˜๋งˆํฌ๋Š” ๊ณ„์‚ฐ ๋ณ‘๋ฆฌํ•™ ์—ฐ๊ตฌ์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ์…‹์ธ CPTAC์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ตฌ์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

BRCA-TP53, PIK3CA: ์ด๋Ÿฌํ•œ ์ž‘์—…์€ ์œ ๋ฐฉ์•” WSI์—์„œ TP53 ๋ฐ PIK3CA mutation ์ƒํƒœ๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ๋‘ ์ž‘์—… ๋ชจ๋‘ CPTAC-BRCA ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜๋ฉฐ, TP53๋Š” train (wild:mut = 53:37), test (14:8)์ด๊ณ  PIK3CA๋Š” train (58:33), test (14:7)์ž…๋‹ˆ๋‹ค.

RCC-PBRM1, BAP1: ์ด๋Ÿฌํ•œ ์ž‘์—…์€ clear cell renal cell carcinoma (CCRCC)์—์„œ PBRM1 ๋ฐ BAP1 mutation ๊ฐ์ง€์— ์ค‘์ ์„ ๋‘ก๋‹ˆ๋‹ค. ๋‘ ๋ฒค์น˜๋งˆํฌ ๋ชจ๋‘ CPTAC-CCRCC ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜๋ฉฐ, PBRM1์€ train (wild:mut = 97:96), test (26:26)์ด๊ณ  BAP1์€ train (156:39), test (46:4)์ž…๋‹ˆ๋‹ค.

COAD-KRAS, TP53: ์ด๋Ÿฌํ•œ ์ž‘์—…์€ ๊ฒฐ์žฅ์„ ์•”์ข…์—์„œ KRAS ๋ฐ TP53 mutation ์ƒํƒœ๋ฅผ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค. ๋‘˜ ๋‹ค CPTAC-COAD ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜๋ฉฐ, KRAS๋Š” train (wild:mut = 50:29), test (11:8)์ด๊ณ  TP53์€ train (53:27), test (12:6)์ž…๋‹ˆ๋‹ค.

3.5 Evaluation Results

Table 1์€ 10๊ฐœ slide-level ๋ฒค์น˜๋งˆํฌ ์ž‘์—…์—์„œ 7๊ฐœ ๋ชจ๋ธ์˜ ๋น„๊ต ์„ฑ๋Šฅ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ํ‰๊ฐ€๋œ ๋ชจ๋“  ๋ชจ๋ธ ์ค‘์—์„œ EXAONE Path 2.0์€ ๊ฐ€์žฅ ๋†’์€ ์ „์ฒด ํ‰๊ท  ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์กฐ์ง ์œ ํ˜•, ๊ธฐ๊ด€, ์˜ˆ์ธก ๋Œ€์ƒ์— ๊ฑธ์นœ robustํ•œ ์ •ํ™•๋„์™€ ์ผ๊ด€๋œ ์ผ๋ฐ˜ํ™”๋ฅผ ๋ชจ๋‘ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

ํ์„ ์•”์ข… ๊ด€๋ จ ์ž‘์—…์—์„œ EXAONE Path 2.0์€ EGFR mutation ์˜ˆ์ธก์—์„œ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ, USA1-LUAD ๋ฐ์ดํ„ฐ์…‹์—์„œ ๊ฐ€์žฅ ๋†’์€ ์ •ํ™•๋„(0.853)๋ฅผ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. KRAS mutation ์ž‘์—…์—์„œ ๋ชจ๋ธ์€ USA2-LUAD ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ตœ๊ณ  ์„ฑ๋Šฅ(0.645)์„ ๊ธฐ๋กํ•˜์—ฌ ๋‹ค๋ฅธ ๋ชจ๋“  baseline์„ ๋Šฅ๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. TMB ๋ถ„๋ฅ˜์—์„œ EXAONE Path 2.0์€ EXAONE Path 1.0๊ณผ TITAN๋ณด๋‹ค ์•ฝ๊ฐ„ ๋’ค์ฒ˜์ง€๊ธด ํ–ˆ์ง€๋งŒ ์ตœ๊ณ  ์„ฑ๋Šฅ ๋ชจ๋ธ๋“ค๊ณผ ๋น„๊ต ๊ฐ€๋Šฅํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

๊ฒฐ์žฅ์ง์žฅ์•” MSI ๋ถ„๋ฅ˜์—์„œ EXAONE Path 2.0์€ ๋‹ค๋ฅธ foundation model๋“ค๊ณผ ๋™๋“ฑํ•œ ๋†’์€ ์ •ํ™•๋„(0.938)๋ฅผ ์œ ์ง€ํ–ˆ์œผ๋ฉฐ, ํ…Œ์ŠคํŠธ ์„ธํŠธ์—์„œ ์•ˆ์ •์ ์ธ ์ผ๋ฐ˜ํ™”๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

์œ ๋ฐฉ์•” ์ž‘์—…์—์„œ ๋ชจ๋ธ์€ ๋ชจ๋“  mutation (TP53, PIK3CA) ๋ฒค์น˜๋งˆํฌ์—์„œ ์ผ๊ด€๋˜๊ฒŒ ๊ฐ•๋ ฅํ•œ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•ญ์ƒ ๊ฐ€์žฅ ๋†’์€ ์ ์ˆ˜๋ฅผ ๋‹ฌ์„ฑํ•˜์ง€๋Š” ์•Š์•˜์ง€๋งŒ, ์ œํ•œ๋œ ํ›ˆ๋ จ ์ƒ˜ํ”Œ์ด ์žˆ๋Š” ๋„์ „์ ์ธ ๋ถ„๋ฅ˜ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ๋„ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

RCC ๋ฒค์น˜๋งˆํฌ์—์„œ EXAONE Path 2.0์€ BAP1 mutation ์ž‘์—…์—์„œ ๋ช…ํ™•ํ•œ ์šฐ์›”์„ฑ์„ ๋ณด์˜€์œผ๋ฉฐ ๊ฐ€์žฅ ๋†’์€ ์ ์ˆ˜(0.807)๋ฅผ ๋‹ฌ์„ฑํ–ˆ๊ณ , PBRM1 ๋ฒค์น˜๋งˆํฌ์—์„œ๋„ ๊ฒฝ์Ÿ๋ ฅ ์žˆ๋Š” ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

๊ฒฐ์žฅ์„ ์•”์ข… ๋ฒค์น˜๋งˆํฌ์—์„œ ๋ชจ๋ธ์€ KRAS ์˜ˆ์ธก์—์„œ ๊ฑฐ์˜ ์ตœ์ ์— ๊ฐ€๊นŒ์šด ์ ์ˆ˜์ธ 0.912์™€ TP53 mutation ๋ถ„๋ฅ˜์—์„œ 0.875๋ฅผ ํฌํ•จํ•˜์—ฌ ์ตœ๊ณ  ์ˆ˜์ค€์˜ ๊ฒฐ๊ณผ์— ๋„๋‹ฌํ–ˆ์Šต๋‹ˆ๋‹ค.

์ „๋ฐ˜์ ์œผ๋กœ EXAONE Path 2.0์€ ์ตœ๊ณ ์˜ ํ‰๊ท  AUROC ์ ์ˆ˜๋ฅผ ๋‹ฌ์„ฑํ–ˆ์œผ๋ฉฐ ๊ฑฐ์˜ ๋ชจ๋“  ์ž‘์—…์—์„œ ์ƒ์œ„ 3์œ„ ์•ˆ์— ๋จธ๋ฌผ๋ €์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๋Š” ์šฐ๋ฆฌ์˜ ํ†ตํ•ฉ๋œ hierarchical ํ”„๋ ˆ์ž„์›Œํฌ์™€ end-to-end ์ตœ์ ํ™” ์ „๋žต์˜ ์ด์ ์„ ์‹ค์ฆ์ ์œผ๋กœ ๊ฒ€์ฆํ•˜๋ฉฐ, EXAONE Path 2.0์ด ๊ด‘๋ฒ”์œ„ํ•œ slide-level ๋ณ‘๋ฆฌํ•™ ์ž‘์—…์„ ์œ„ํ•œ ๊ฐ•๋ ฅํ•˜๊ณ  ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅํ•œ foundation model ์—ญํ• ์„ ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๋ชจ๋“  ๋ฒค์น˜๋งˆํฌ์— ๊ฑธ์นœ ์ „์ฒด์ ์ธ ๋น„๊ต๋ฅผ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” radar ๋ฐ bar chart๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ์‹œ๊ฐํ™”ํ–ˆ์Šต๋‹ˆ๋‹ค(Figure 3). ์ฐจํŠธ๋Š” 10๊ฐœ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์…‹์—์„œ ๊ฐ ๋ชจ๋ธ์˜ AUROC๋ฅผ ๋ณด์—ฌ์ฃผ๋ฉฐ, ์„ฑ๋Šฅ ์ผ๊ด€์„ฑ์˜ ์ง๊ด€์ ์ธ ์ดํ•ด๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ๋ณด์—ฌ์ง„ ๋ฐ”์™€ ๊ฐ™์ด, EXAONE Path 2.0์€ ๋ชจ๋“  ๋ฒค์น˜๋งˆํฌ์—์„œ ์ผ๊ด€๋˜๊ฒŒ ๊ฐ•๋ ฅํ•œ ๋ฒ”์œ„๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ด๋Š” ํŠน์ • ์ž‘์—…์—์„œ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ๋ณด์ด๋Š” ๋‹ค๋ฅธ ๋งŽ์€ foundation model๋“ค์— ๋น„ํ•ด ์šฐ์ˆ˜ํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ๊ณผ ๊ฒฌ๊ณ ์„ฑ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. EXAONE Path 2.0์˜ ์‹œ๊ฐ์ ์œผ๋กœ ์ง€๋ฐฐ์ ์ธ ํ”„๋กœํŒŒ์ผ์€ ๊ทธ๊ฒƒ์˜ ์„ ๋„์ ์ธ ํ‰๊ท  ์„ฑ๋Šฅ์„ ๊ฐ•ํ™”ํ•˜๊ณ  ๋ฒ”์šฉ slide-level foundation model๋กœ์„œ์˜ ์ ํ•ฉ์„ฑ์„ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค.

  1. Conclusion

์šฐ๋ฆฌ๋Š” ์ง์ ‘์ ์ธ slide-level ์ง€๋„ํ•™์Šต ํ•˜์—์„œ patch-level representation์„ ํ•™์Šตํ•˜๋Š” ๋ณ‘๋ฆฌํ•™ foundation model์ธ EXAONE Path 2.0์„ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ์ ‘๊ทผ๋ฒ•์€ slide-level supervised signal์ด ๋ชจ๋“  hierarchical ๋‹จ๊ณ„๋ฅผ ํ†ตํ•ด ์ „ํŒŒ๋˜๋„๋ก ํ•˜์—ฌ ์ž„์ƒ์ ์œผ๋กœ ๊ด€๋ จ๋œ representation์˜ end-to-end ํ•™์Šต์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์€ hierarchical architecture ์„ค๊ณ„, curriculum learning, activation checkpointing๊ณผ CPU offloading์„ ํฌํ•จํ•œ ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ๊ณ„์‚ฐ์  ๊ณผ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋‹ค์–‘ํ•œ biomarker ์˜ˆ์ธก ์ž‘์—…์— ๊ฑธ์นœ multi-task learning์„ ์‚ฌ์šฉํ•˜๊ณ  ์†Œ๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ํ™˜๊ฒฝ์—์„œ ๊ณผ์ ํ•ฉ์„ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด early exit ์ „๋žต์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” EXAONE Path 2.0์ด ํ›ˆ๋ จ์— ๋‹จ 37k WSI๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ 10๊ฐœ biomarker ์˜ˆ์ธก ์ž‘์—…์—์„œ ๊ฒฝ์Ÿ๋ ฅ ์žˆ๋Š” ํ‰๊ท  ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜์—ฌ ๊ธฐ์กด foundation model๋“ค์— ๋น„ํ•ด ํ–ฅ์ƒ๋œ ๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋ชจ๋ธ์€ ๋‹ค์–‘ํ•œ ์•” ์œ ํ˜•๊ณผ ์˜ˆ์ธก ๋Œ€์ƒ์—์„œ ์ผ๊ด€๋˜๊ฒŒ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๋Š” ์ง์ ‘์ ์ธ slide-level ์ง€๋„ํ•™์Šต์ด ์ž„์ƒ์ ์œผ๋กœ ๊ด€๋ จ๋œ ํŠน์„ฑ์„ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์šฐ๋ฆฌ๊ฐ€ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•๋“ค์ด gigapixel ์ด๋ฏธ์ง€ ํ›ˆ๋ จ์˜ ๊ณ„์‚ฐ์  ๊ณผ์ œ๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜์—ฌ ๋ณ‘๋ฆฌํ•™ foundation model์„ ์œ„ํ•œ ์‹ค์šฉ์ ์ธ ์ ‘๊ทผ๋ฒ•์„ ์ œ๊ณตํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.



-->