[Paper Review] K-EXAONE Technical Report

Posted by Euisuk's Dev Log on February 16, 2026

[Paper Review] K-EXAONE Technical Report

https://arxiv.org/abs/2601.01739

๋„์ž…: ํ•œ๊ตญ AI ์ƒํƒœ๊ณ„์˜ ๋„์ „๊ณผ K-EXAONE์˜ ํƒ„์ƒ

๊ธ€๋กœ๋ฒŒ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM) ๊ฐœ๋ฐœ ๊ฒฝ์Ÿ์ด ์น˜์—ดํ•ด์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Closed-Source ๋ชจ๋ธ์ด ์—ฌ์ „ํžˆ ์„ฑ๋Šฅ ์šฐ์œ„๋ฅผ ์ ํ•˜๊ณ  ์žˆ์ง€๋งŒ, Open-Weight ๋ชจ๋ธ๋“ค์ด ์ˆ˜์ฒœ์–ต ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋„˜์–ด ์กฐ(trillion) ๋‹จ์œ„ ์Šค์ผ€์ผ๋กœ ๊ณต๊ฒฉ์ ์œผ๋กœ ํ™•์žฅํ•˜๋ฉฐ ๊ทธ ๊ฒฉ์ฐจ๋ฅผ ๋น ๋ฅด๊ฒŒ ์ขํžˆ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ํ•œ๊ตญ์˜ ์ƒํ™ฉ์€ ๋…ํŠนํ•œ ๋„์ „์„ ์•ˆ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ธ€๋กœ๋ฒŒ ์„ ๋‘ ๊ตญ๊ฐ€ ๋Œ€๋น„ AI ์ „์šฉ ๋ฐ์ดํ„ฐ์„ผํ„ฐ์™€ AI ์นฉ์ด ์ƒ๋Œ€์ ์œผ๋กœ ๋ถ€์กฑํ•˜์—ฌ, ๊ทธ๊ฐ„ ์ˆ˜๋ฐฑ์–ต(tens of billions) ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜์ค€์˜ ๋น„์šฉ ํšจ์œจ์  ์†Œ๊ทœ๋ชจ ๋ชจ๋ธ ๊ฐœ๋ฐœ์— ์ง‘์ค‘ํ•ด์™”์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ AI ์ „ํ™˜์˜ ๊ทผ๋ณธ์  ๊ธฐ๋ฐ˜์„ ํ™•๋ณดํ•˜๋ ค๋ฉด ๊ธ€๋กœ๋ฒŒ ์ตœ์ƒ์œ„ ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ์„ ๊ฐ–์ถ˜ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ์ด ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ธํ”„๋ผ ๊ฒฉ์ฐจ๋ฅผ ํ•ด์†Œํ•˜๊ธฐ ์œ„ํ•ด ํ•œ๊ตญ ์ •๋ถ€๊ฐ€ GPU ๋“ฑ ํ•ต์‹ฌ ์ž์›์„ ์ œ๊ณตํ•˜๋Š” ์ „๋žต์  ํ”„๋กœ๊ทธ๋žจ์„ ์‹œ์ž‘ํ–ˆ๊ณ , LG AI Research๊ฐ€ ์ด ์ง€์›์„ ํ™œ์šฉํ•˜์—ฌ ๊ฐœ๋ฐœํ•œ ๊ฒƒ์ด ๋ฐ”๋กœ K-EXAONE์ž…๋‹ˆ๋‹ค.

K-EXAONE์€ ์ด์ „ ๋ชจ๋ธ์ธ EXAONE 4.0์˜ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์•„ํ‚คํ…์ฒ˜(์ถ”๋ก /๋น„์ถ”๋ก  ํ†ตํ•ฉ)๋ฅผ ๊ณ„์Šนํ•˜๋ฉด์„œ, ์„ธ ๊ฐ€์ง€ ํ•ต์‹ฌ ํ˜์‹ ์„ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค. ์ฒซ์งธ, Mixture-of-Experts(MoE) ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ฑ„ํƒํ•˜์—ฌ ์ด 236B ํŒŒ๋ผ๋ฏธํ„ฐ ์ค‘ ์ถ”๋ก  ์‹œ 23B๋งŒ ํ™œ์„ฑํ™”ํ•˜๋Š” ํšจ์œจ์  ์Šค์ผ€์ผ๋ง์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋‘˜์งธ, ๊ธฐ์กด 3๊ฐœ ์–ธ์–ด(ํ•œ๊ตญ์–ด, ์˜์–ด, ์ŠคํŽ˜์ธ์–ด)์—์„œ ๋…์ผ์–ด, ์ผ๋ณธ์–ด, ๋ฒ ํŠธ๋‚จ์–ด๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ 6๊ฐœ ์–ธ์–ด๋กœ ๋‹ค๊ตญ์–ด ์ปค๋ฒ„๋ฆฌ์ง€๋ฅผ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค. ์…‹์งธ, 256K ํ† ํฐ์˜ Context Window๋ฅผ ์ง€์›ํ•˜์—ฌ ์‹ค์„ธ๊ณ„ Long-Context ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ๋Œ€์‘ํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜ ์„ค๊ณ„

Fine-Grained Sparse MoE: ํ‘œํ˜„๋ ฅ๊ณผ ํšจ์œจ์˜ ์–‘๋ฆฝ

K-EXAONE์€ ๊ธฐ์กด EXAONE ์‹œ๋ฆฌ์ฆˆ์˜ Dense ๋ชจ๋ธ๋ง ํŒจ๋Ÿฌ๋‹ค์ž„์—์„œ ๋ฒ—์–ด๋‚˜, 100B+ ๊ทœ๋ชจ ๋ชจ๋ธ ํ•™์Šต์—์„œ ์ ์  ๋ณดํŽธํ™”๋˜๊ณ  ์žˆ๋Š” MoE ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ฑ„ํƒํ•ฉ๋‹ˆ๋‹ค. ํ•ต์‹ฌ ์„ค๊ณ„ ์ฒ ํ•™์€ ๋†’์€ ํ‘œํ˜„ ๋‹ค์–‘์„ฑ๊ณผ ์ž์› ํšจ์œจ์  ํ•™์Šต/์ถ”๋ก ์˜ ์–‘๋ฆฝ์ž…๋‹ˆ๋‹ค.

๊ตฌ์ฒด์ ์œผ๋กœ, 128๊ฐœ์˜ Expert ํ’€์—์„œ ํ† ํฐ๋‹น Top-8 Expert๋ฅผ ๋ผ์šฐํŒ…ํ•˜๊ณ , ์—ฌ๊ธฐ์— 1๊ฐœ์˜ Shared Expert๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์ด 9๊ฐœ์˜ Expert๊ฐ€ ๋™์‹œ์— ํ™œ์„ฑํ™”๋ฉ๋‹ˆ๋‹ค. ์ด ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ 236B์ด์ง€๋งŒ ํ™œ์„ฑ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ์•ฝ 23B์— ๋ถˆ๊ณผํ•˜์—ฌ, Dense ๋ชจ๋ธ ๋Œ€๋น„ ํ›จ์”ฌ ํšจ์œจ์ ์ธ ์—ฐ์‚ฐ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

MoE ๊ตฌ์กฐ์—์„œ ํ•ต์‹ฌ์ ์ธ ๋ผ์šฐํŒ… ์•ˆ์ •์„ฑ๊ณผ Expert ํ™œ์šฉ ํšจ์œจ์„ ์œ„ํ•ด ๋‘ ๊ฐ€์ง€ ๊ธฐ๋ฒ•์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. Sequence-Level Load Balancing์€ ํŠน์ • Expert์— ํ† ํฐ์ด ํŽธ์ค‘๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๋ฉฐ, Dropless Routing Policy๋Š” ๋ชจ๋“  ํ† ํฐ์ด Capacity ์ œํ•œ ์—†์ด Expert์— ๋””์ŠคํŒจ์น˜๋˜๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค. ์ด ๋‘ ๊ธฐ๋ฒ•์˜ ์กฐํ•ฉ์€ ๋Œ€๊ทœ๋ชจ MoE ํ•™์Šต์—์„œ Gradient Flow๋ฅผ ์•ˆ์ •ํ™”ํ•˜๊ณ  ์ˆ˜๋ ด ํ–‰๋™์„ ๊ฐœ์„ ํ•˜๋Š” ๋ฐ ํ•ต์‹ฌ์ ์ž…๋‹ˆ๋‹ค.

Hybrid Attention๊ณผ ๋ธ”๋ก ๊ตฌ์กฐ

K-EXAONE์˜ Main Block์€ ์ด 48๊ฐœ ๋ ˆ์ด์–ด๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, Sliding Window Attention(SWA) 36๊ฐœ ๋ ˆ์ด์–ด์™€ Global Attention(GA) 12๊ฐœ ๋ ˆ์ด์–ด๊ฐ€ ํ˜ผํ•ฉ๋œ Hybrid Attention ๊ตฌ์กฐ๋ฅผ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค. ์ „์ฒด ๋ ˆ์ด์–ด์— GA๋ฅผ ์ ์šฉํ•˜๋Š” ๊ฒƒ ๋Œ€๋น„ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„์™€ ์—ฐ์‚ฐ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ํฌ๊ฒŒ ์ค„์ด๋ฉด์„œ๋„ Long-Context ๋ชจ๋ธ๋ง ๋Šฅ๋ ฅ์„ ์œ ์ง€ํ•˜๋Š” ์„ค๊ณ„์ž…๋‹ˆ๋‹ค.

๋ธ”๋ก ๊ตฌ์กฐ์˜ ํ๋ฆ„์„ ํ…์ŠคํŠธ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

1
2
3
4
5
6
7
8
์ž…๋ ฅ ํ† ํฐ โ†’ Embedding
  โ†’ [SWA + Sparse MoE] ร— 3 ๋ ˆ์ด์–ด
  โ†’ [GA + Sparse MoE] ร— 1 + [SWA + Sparse MoE] ร— 3 ๋ฐ˜๋ณต ร— 12
  โ†’ [GA + Sparse MoE] ร— 1
  โ†’ RMSNorm โ†’ LM Head โ†’ ์ถœ๋ ฅ

* ๊ฐ ๋ธ”๋ก ๋‚ด๋ถ€: Attention โ†’ RMSNorm โ†’ Sparse MoE (128 experts, top-8 + 1 shared) โ†’ RMSNorm
* ์ฒซ ๋ฒˆ์งธ ๋ ˆ์ด์–ด๋งŒ MoE ๋Œ€์‹  Dense FFN (hidden size: 18,432) โ†’ ํ•™์Šต ์•ˆ์ •์„ฑ ํ™•๋ณด

์—ฌ๊ธฐ์„œ ์ฃผ๋ชฉํ•  ์„ค๊ณ„ ๊ฒฐ์ •์ด ๋ช‡ ๊ฐ€์ง€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋ ˆ์ด์–ด๋ฅผ Dense FFN์œผ๋กœ ๊ตฌํ˜„ํ•œ ๊ฒƒ์€ MoE ํ•™์Šต ์ดˆ๊ธฐ์˜ ๋ถˆ์•ˆ์ •์„ฑ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ์„ ํƒ์ž…๋‹ˆ๋‹ค. SWA์˜ Window Size๋ฅผ ๊ธฐ์กด 4,096์—์„œ 128๋กœ ๋Œ€ํญ ์ถ•์†Œํ•œ ๊ฒƒ์€ Long-Context ์ถ”๋ก  ์‹œ KV-Cache ์‚ฌ์šฉ๋Ÿ‰์„ ๊ทน๋‹จ์ ์œผ๋กœ ์ค„์ด๋ฉด์„œ๋„ ๋ชจ๋ธ๋ง ์šฉ๋Ÿ‰์„ ๋ณด์กดํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. Attention Head๋Š” Query 64๊ฐœ, Key-Value 8๊ฐœ์˜ Grouped Query Attention(GQA) ๊ตฌ์„ฑ์ด๋ฉฐ, Head Dimension์€ 128์ž…๋‹ˆ๋‹ค.

ํ•™์Šต ์•ˆ์ •์„ฑ๊ณผ Long-Context ์™ธ์‚ฝ์„ ๊ฐ•ํ™”ํ•˜๊ธฐ ์œ„ํ•ด EXAONE 4.0์—์„œ ๋‘ ๊ฐ€์ง€ ๊ธฐ๋ฒ•์„ ๊ณ„์Šนํ•ฉ๋‹ˆ๋‹ค. QK Norm์€ Attention ์—ฐ์‚ฐ ์ „์— Query/Key ๋ฒกํ„ฐ์— Layer Normalization์„ ์ ์šฉํ•˜์—ฌ, ๊นŠ์€ ๋„คํŠธ์›Œํฌ์—์„œ Attention Logit์ด ํญ๋ฐœํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค. SWA-only RoPE๋Š” Rotary Positional Embeddings๋ฅผ SWA ๋ ˆ์ด์–ด์—๋งŒ ์„ ํƒ์ ์œผ๋กœ ์ ์šฉํ•˜์—ฌ, GA์—์„œ์˜ ๊ธ€๋กœ๋ฒŒ ํ† ํฐ ์ƒํ˜ธ์ž‘์šฉ์— ๋Œ€ํ•œ ๊ฐ„์„ญ์„ ๋ฐฉ์ง€ํ•˜๊ณ  Long-Sequence ์™ธ์‚ฝ ๊ฒฌ๊ณ ์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค.

Multi-Token Prediction(MTP) ๋ชจ๋“ˆ

K-EXAONE์€ Dense Layer ๊ธฐ๋ฐ˜์˜ MTP ๋ชจ๋“ˆ์„ ํ†ตํ•ฉํ•˜์—ฌ ํ˜„์žฌ ํ† ํฐ๋ฟ ์•„๋‹ˆ๋ผ xt+1x_{t+1}xt+1โ€‹ ๋ฏธ๋ž˜ ํ† ํฐ๊นŒ์ง€ ์˜ˆ์ธกํ•˜๋Š” ๋ณด์กฐ ํ•™์Šต ๋ชฉํ‘œ๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋“ˆ์˜ ์—ญํ• ์€ ์ด์ค‘์ ์ž…๋‹ˆ๋‹ค. ํ•™์Šต ์‹œ์—๋Š” Future-Token ์˜ˆ์ธก ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” Auxiliary Loss๋กœ ๊ธฐ๋Šฅํ•˜๊ณ , ์ถ”๋ก  ์‹œ์—๋Š” Self-Drafting์— ํ™œ์šฉ๋˜์–ด ํ‘œ์ค€ Autoregressive Decoding ๋Œ€๋น„ ์•ฝ 1.5๋ฐฐ์˜ ๋””์ฝ”๋”ฉ ์ฒ˜๋ฆฌ๋Ÿ‰ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

MTP Block ์ž์ฒด์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋Š” 0.52B์œผ๋กœ ๋งค์šฐ ๊ฒฝ๋Ÿ‰์ด๋ฉฐ, Dense Layer ๊ธฐ๋ฐ˜ ์„ค๊ณ„๋ฅผ ํ†ตํ•ด ๋ผ์šฐํŒ… ์˜ค๋ฒ„ํ—ค๋“œ์™€ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„๋ฅผ ์ตœ์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค. MTP Block์€ Main Block์˜ LM Head์™€ Embedding์„ ๊ณต์œ (shared)ํ•˜์—ฌ ์ถ”๊ฐ€์ ์ธ ํŒŒ๋ผ๋ฏธํ„ฐ ๋ถ€๋‹ด์„ ์ค„์ž…๋‹ˆ๋‹ค.

์•„ํ‚คํ…์ฒ˜ ๊ตฌ์„ฑ ์š”์•ฝ

๊ตฌ์„ฑ ์š”์†Œ ์„ธ๋ถ€ ์„ค์ • ๊ฐ’
Main Block Layers (Total / SWA / GA) 48 / 36 / 12
ย  Sliding Window Size 128
ย  Attention Heads (Q / KV) 64 / 8
ย  Head Dimension 128
ย  Experts (Total / Shared / Activated) 128 / 1 / 8
ย  Parameters (Total / Activated) 236B / 23B
MTP Block Attention Heads (Q / KV) 64 / 8
ย  Head Dimension 128
ย  Parameters 0.52B

Tokenizer ์žฌ์„ค๊ณ„: SuperBPE์™€ 150K ์–ดํœ˜

K-EXAONE์€ Tokenizer๋ฅผ ์ „๋ฉด ์žฌ์„ค๊ณ„ํ•˜์—ฌ Vocabulary Size๋ฅผ ๊ธฐ์กด 100K์—์„œ 150K๋กœ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค. ์„ค๊ณ„ ์ „๋žต์˜ ํ•ต์‹ฌ์€ ๊ธฐ์กด ์–ดํœ˜์˜ ๊ณ ๋นˆ๋„ 70%๋ฅผ ๋ณด์กดํ•˜๋ฉด์„œ, ๋‚˜๋จธ์ง€ ์šฉ๋Ÿ‰์„ ์ถ”๊ฐ€ ์–ธ์–ด, STEM(Science, Technology, Engineering, Mathematics), ์ฝ”๋“œ ๋„๋ฉ”์ธ์œผ๋กœ ์žฌ๋ฐฐ๋ถ„ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

ํŠนํžˆ SuperBPE ์ „๋žต์„ ๋„์ž…ํ•˜์—ฌ ๋นˆ๋ฒˆํ•œ ๋‹จ์–ด ์‹œํ€€์Šค๋ฅผ ๋‹จ์ผ ํ† ํฐ(Superword)์œผ๋กœ ์ธ์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค. ์ด Superword Token์€ ์ „์ฒด ์–ดํœ˜์˜ ์•ฝ 20%๋ฅผ ์ฐจ์ง€ํ•˜๋ฉฐ, ์˜์–ด:ํ•œ๊ตญ์–ด:๋‹ค๊ตญ์–ด = 2:3:1 ๋น„์œจ๋กœ ํ• ๋‹น๋ฉ๋‹ˆ๋‹ค. ํ•œ๊ตญ์–ด์— ๊ฐ€์žฅ ๋†’์€ ๋น„์ค‘์„ ๋‘” ์ ์ด Sovereign AI๋กœ์„œ์˜ ์„ค๊ณ„ ์˜๋„๋ฅผ ๋ฐ˜์˜ํ•ฉ๋‹ˆ๋‹ค.

์ถ”๊ฐ€์ ์œผ๋กœ Pre-Tokenization Regex๋ฅผ ์—…๋ฐ์ดํŠธํ•˜์—ฌ Superword ๊ฒฝ๊ณ„, ์ค„๋ฐ”๊ฟˆ, ๋‹ค๊ตญ์–ด ์œ ๋‹ˆ์ฝ”๋“œ ๋ฌธ์ž๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ , Unicode Normalization์„ NFKC์—์„œ NFC๋กœ ์ „ํ™˜ํ•ฉ๋‹ˆ๋‹ค. NFC ์ „ํ™˜์˜ ์ด์œ ๋Š” ์ฝ”๋“œ ๋ฐ STEM ์ฝ”ํผ์Šค์—์„œ ํ”ํžˆ ๋ฐœ๊ฒฌ๋˜๋Š” ์œ„์ฒจ์ž, ์•„๋ž˜์ฒจ์ž, ๊ธฐํ˜ธ๊ฐ€ ๋งŽ์€ ํ…์ŠคํŠธ์˜ ์˜๋ฏธ์  ๊ตฌ๋ถ„์„ ๋ณด์กดํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ฒฐ๊ณผ์ ์œผ๋กœ Bytes per Token ๊ธฐ์ค€์œผ๋กœ ์ „ ๋„๋ฉ”์ธ์—์„œ ํ‰๊ท  ์•ฝ 30%์˜ ํ† ํฐ ํšจ์œจ์„ฑ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋„๋ฉ”์ธ๋ณ„๋กœ๋Š” ๋‹ค๊ตญ์–ด์—์„œ +49.8%, ํ•œ๊ตญ์–ด์—์„œ +29.0%, ์ฝ”๋“œ์—์„œ +26.7%, STEM์—์„œ +20.1%, ์˜์–ด์—์„œ +19.6%์˜ ๊ฐœ์„ ์„ ๋ณด์ž…๋‹ˆ๋‹ค.

ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ

Pre-training: 3๋‹จ๊ณ„ ์ปค๋ฆฌํ˜๋Ÿผ

K-EXAONE์€ ์ด 11T ํ† ํฐ, 1.52ร—10241.52 \times 10^{24}1.52ร—1024 FLOPs ๊ทœ๋ชจ์˜ ์‚ฌ์ „ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, 3๋‹จ๊ณ„ ์ปค๋ฆฌํ˜๋Ÿผ์„ ํ†ตํ•ด ๊ธฐ์ดˆ ์ง€์‹ โ†’ ๋„๋ฉ”์ธ ์ „๋ฌธ์„ฑ โ†’ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ์ ์ง„์ ์œผ๋กœ ๊ตฌ์ถ•ํ•ฉ๋‹ˆ๋‹ค. EXAONE 4.0์˜ ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ณ„์Šนํ•˜๋ฉด์„œ ๋‹ค์ค‘ ๋ฐ์ดํ„ฐ ํ•„ํ„ฐ๋ง์„ ์ ์šฉํ•˜์—ฌ ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ๋ฅผ ํ™•๋ณดํ•ฉ๋‹ˆ๋‹ค.

ํ•™์Šต ์…‹์—…์—์„œ ์ฃผ๋ชฉํ•  ์ ์€ FP8 ์ •๋ฐ€๋„๋กœ ๋„ค์ดํ‹ฐ๋ธŒ ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜๋ฉด์„œ๋„ BF16๊ณผ ๋™๋“ฑํ•œ ํ•™์Šต Loss ๊ณก์„ ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Š” FP8 ํ•™์Šต์ด ์ตœ์ ํ™” ์•ˆ์ •์„ฑ์„ ๋ณด์กดํ•˜๋ฉด์„œ๋„ Full Quantization-Aware ์ˆ˜๋ ด์ด ๊ฐ€๋Šฅํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. Optimizer๋กœ๋Š” Muon์„ ์ฑ„ํƒํ•˜๊ณ , Learning Rate Scheduler๋Š” Warmup-Stable-Decay(WSD)๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ฃผ์š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ์ตœ๋Œ€ ํ•™์Šต๋ฅ  3.0ร—10โˆ’43.0 \times 10^{-4}3.0ร—10โˆ’4, MoE Sequence Auxiliary Loss ๊ณ„์ˆ˜ 1.0ร—10โˆ’41.0 \times 10^{-4}1.0ร—10โˆ’4, Expert Bias Update Factor 1.0ร—10โˆ’41.0 \times 10^{-4}1.0ร—10โˆ’4, MTP Loss Weight 0.05์ž…๋‹ˆ๋‹ค.

๋‹ค๊ตญ์–ด ํ™•์žฅ์„ ์œ„ํ•ด์„œ๋Š” Cross-Lingual Knowledge Transfer๋ฅผ ํ™œ์šฉํ•œ ํ•ฉ์„ฑ ์ฝ”ํผ์Šค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์–ธ์–ด๋ณ„ ์‚ฌ์ „ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๊ฐ€ ํฌ๊ฒŒ ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์—, ์ „๋ฌธ ์ง€์‹๊ณผ ์ถ”๋ก  ํŒจํ„ด์„ ์–ธ์–ด ๊ฐ„์— ์ „ํŒŒํ•˜๋Š” Synthetic Corpora๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์ž…๋ ฅ ์–ธ์–ด์— ๊ด€๊ณ„์—†์ด ๊ท ์ผํ•œ ์„ฑ๋Šฅ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ Thinking-Augmented Data Synthesis๋ฅผ ํ†ตํ•ด ์‚ฌ์ „ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๋ช…์‹œ์  ์ถ”๋ก  ๊ฐ๋…์„ ํฌํ•จ์‹œํ‚ต๋‹ˆ๋‹ค. ๋ฌธ์„œ ๊ธฐ๋ฐ˜์˜ Thinking Trajectory๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์ด๋ฅผ ์†Œ์Šค ์ฝ˜ํ…์ธ ์™€ ๊ฒฐํ•ฉํ•˜์—ฌ, ๋‹จ๊ณ„๋ณ„ ์ถ”๋ก ์„ ์ธ์ฝ”๋”ฉํ•˜๋Š” ํ†ตํ•ฉ ์ƒ˜ํ”Œ์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ Thinking-Augmented ์ฝ”ํผ์Šค๋Š” ์ถ”๋ก  ํ–‰๋™์˜ ์ „์ด๋ฅผ ์ด‰์ง„ํ•˜๊ณ  ํ›„์† Post-Training์˜ ํšจ๊ณผ๋ฅผ ๋†’์ด๋Š” ์ „๋žต์ž…๋‹ˆ๋‹ค.

Context Length Extension: 8K โ†’ 32K โ†’ 256K

K-EXAONE์€ 2๋‹จ๊ณ„ Context Length Extension์„ ํ†ตํ•ด ์ตœ๋Œ€ 256K ํ† ํฐ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ ๋ชจ๋ธ์€ 8K ํ† ํฐ์œผ๋กœ ์‚ฌ์ „ํ•™์Šต๋œ ํ›„, Stage 1์—์„œ 8K โ†’ 32K, Stage 2์—์„œ 32K โ†’ 256K๋กœ ํ™•์žฅ๋ฉ๋‹ˆ๋‹ค. ๋‘ ๋‹จ๊ณ„ ๋ชจ๋‘ ๋™์ผํ•œ ์„ธ ๊ฐ€์ง€ ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ๊ณต์œ ํ•˜๋˜, ๊ฐ ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ์™€ ์•ˆ์ •์„ฑ ์š”๊ตฌ์— ๋งž๊ฒŒ ์ƒ˜ํ”Œ๋ง ๋น„์œจ์„ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

Rehearsal Dataset์€ Long-Context ํŠนํ™” ํ•™์Šต์˜ ๊ฐ€์žฅ ํฐ ์œ„ํ—˜์ธ Short-Context ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ํ•ต์‹ฌ ๊ตฌ์„ฑ ์š”์†Œ์ž…๋‹ˆ๋‹ค. ์‚ฌ์ „ํ•™์Šต ๋ถ„ํฌ์—์„œ ์ถ”์ถœํ•œ ๊ณ ํ’ˆ์งˆ ์ƒ˜ํ”Œ์„ ํฌํ•จํ•˜์—ฌ, ์งง์€ Context์—์„œ์˜ ๋ชจ๋ธ ํ–‰๋™์„ ์•ต์ปค๋งํ•˜๋Š” ์ผ๊ด€๋œ ํ•™์Šต ์‹ ํ˜ธ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋‘ Stage ๋ชจ๋‘์— ํฌํ•จ๋˜๋˜, Stage๋ณ„๋กœ ๋น„์œจ์„ ์กฐ์ •ํ•˜์—ฌ Long-Context ํ•™์Šต ์‹ ํ˜ธ๊ฐ€ ์ถฉ๋ถ„ํžˆ ๋ฐ˜์˜๋˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

Synthetic Reasoning Dataset์€ ์ˆ˜ํ•™, ๊ณผํ•™, ๊ฒฝ์Ÿ ํ”„๋กœ๊ทธ๋ž˜๋ฐ์˜ ๋„์ „์  ๋ฌธ์ œ๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ์ตœ์ข… ๋‹ต๋ณ€๋ฟ ์•„๋‹ˆ๋ผ ์ค‘๊ฐ„ ์ถ”๋ก  ํŒจํ„ด ํ•™์Šต์„ ์žฅ๋ คํ•˜๋Š” ํ•ฉ์„ฑ ์ถ”๋ก  ์ฝ˜ํ…์ธ ๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ์…‹์€ Context Extension ์ „ ๊ณผ์ •์— ๊ฑธ์ณ ํ†ตํ•ฉ๋˜์–ด, ๊ธด ์ž…๋ ฅ ํ•˜์—์„œ๋„ ์ถ”๋ก  ํ’ˆ์งˆ์ด ํ–ฅ์ƒ๋˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

Long-Document Dataset์€ ๋‹จ์ผ ํ•™์Šต ์ธ์Šคํ„ด์Šค ๋‚ด์—์„œ ์†Œ๋น„๋  ์ˆ˜ ์žˆ๋Š” ์ „์ฒด ๋ฌธ์„œ ์‹œํ€€์Šค๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ „์ฒด Long-Document๋ฅผ Truncation ์—†์ด End-to-End๋กœ ํ•™์Šตํ•˜์—ฌ Long-Range Dependency ํฌ์ฐฉ์„ ์žฅ๋ คํ•ฉ๋‹ˆ๋‹ค. Stage 1์—์„œ๋Š” 32K๊นŒ์ง€์˜ ์•ˆ์ •์  ์„ฑ๋Šฅ์— ์šฐ์„ ์ˆœ์œ„๋ฅผ ๋‘๊ณ , Stage 2์—์„œ๋Š” 256K๊นŒ์ง€์˜ ์˜์กด์„ฑ์„ ๋ชจ๋ธ๋งํ•˜๊ธฐ ์œ„ํ•ด Long-Document ์ƒ˜ํ”Œ์˜ ๋น„์ค‘์„ ๋†’์ž…๋‹ˆ๋‹ค.

ํ’ˆ์งˆ ๊ฒ€์ฆ์„ ์œ„ํ•ด ์‚ฌ์ „ํ•™์Šต๊ณผ ๋™์ผํ•œ ํ”„๋กœํ† ์ฝœ์˜ Short-Context ํ‰๊ฐ€์™€ Needle-In-A-Haystack(NIAH) ํ…Œ์ŠคํŠธ๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ Stage์˜ ๋ชฉํ‘œ Context Range์—์„œ ๊ฑฐ์˜ ์™„๋ฒฝํ•œ NIAH ์„ฑ๋Šฅ(โ€œgreen lightโ€)์„ ๋‹ฌ์„ฑํ•  ๋•Œ๊นŒ์ง€ ํ•™์Šต์„ ๋ฐ˜๋ณตํ•˜๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด K-EXAONE์ด ์ „๋ฐ˜์  ์„ฑ๋Šฅ ์ €ํ•˜ ์—†์ด 256K ํ† ํฐ์œผ๋กœ ์„ฑ๊ณต์ ์œผ๋กœ ํ™•์žฅ๋˜์—ˆ์Œ์„ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

Post-training: SFT โ†’ RL โ†’ Preference Learning

Post-training์€ ์„ธ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ฒซ์งธ, ๋Œ€๊ทœ๋ชจ Supervised Fine-Tuning(SFT)์„ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ์‚ฌ์šฉ์ž ์ง€์‹œ๋ฅผ ๋”ฐ๋ฅด๊ณ  ์‘๋‹ต์„ ์ƒ์„ฑํ•˜๋Š” ๋Šฅ๋ ฅ์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ํƒœ์Šคํฌ๋ฅผ ์—ฌ๋Ÿฌ ๋„๋ฉ”์ธ์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜๊ณ  ๊ฐ๊ฐ์— ๋งž๋Š” ์ƒ์„ฑ ๋ฐฉ๋ฒ•์ด๋‚˜ ์ „๋ฌธ๊ฐ€๋ฅผ ์ฑ„ํƒํ•ฉ๋‹ˆ๋‹ค. ํ•œ๊ตญ์–ด ํŠนํ™” ๋Šฅ๋ ฅ ๊ฐ•ํ™”๋ฅผ ์œ„ํ•ด ๊ณผํ•™๊ธฐ์ˆ ์ •๋ณดํ†ต์‹ ๋ถ€(MSIT)์™€ ํ•œ๊ตญ์ง€๋Šฅ์ •๋ณด์‚ฌํšŒ์ง„ํฅ์›(NIA), ํ•œ๊ตญ๋ฐ์ดํ„ฐ์‚ฐ์—…์ง„ํฅ์›(K-DATA) ๋“ฑ์ด ์ œ๊ณตํ•˜๋Š” ๊ณต๊ณต ๋ฐ ๊ธฐ๊ด€ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

Agentic Tool Use ํ•™์Šต์—์„œ๋Š” ์‹ค์ œ Tool ํ™˜๊ฒฝ ๊ตฌ์ถ•์˜ ๋†’์€ ๋น„์šฉ๊ณผ ๋น„ํšจ์œจ์„ฑ์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด LLM์„ ํ™œ์šฉํ•œ Synthetic Tool Environment๋ฅผ ๊ตฌ์ถ•ํ•ฉ๋‹ˆ๋‹ค. Tool-Use ์‹œ๋‚˜๋ฆฌ์˜ค์™€ ๊ฒ€์ฆ ๊ฐ€๋Šฅํ•œ ํ†ต๊ณผ ๊ธฐ์ค€์„ ํฌํ•จํ•˜๋Š” ํ•ฉ์„ฑ ํ™˜๊ฒฝ์„ ์ƒ์„ฑํ•œ ๋’ค, LLM์œผ๋กœ ํ‰๊ฐ€ํ•˜์—ฌ ๋น„ํ˜„์‹ค์ ์ด๊ฑฐ๋‚˜ ํ’€ ์ˆ˜ ์—†๋Š” ์ผ€์ด์Šค๋ฅผ ํ•„ํ„ฐ๋งํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์„ ํ†ตํ•ด ์ˆ˜๋ฐฑ ๊ฐœ์˜ ๊ฒ€์ฆ ๊ฐ€๋Šฅํ•˜๊ณ  ํ˜„์‹ค์ ์ธ Tool-Use Task์™€ ํ‰๊ฐ€ ํ™˜๊ฒฝ์„ ํ™•๋ณดํ•ฉ๋‹ˆ๋‹ค.

Web Search ์ˆ˜ํ–‰ ์‹œ์—๋Š” ๋‘ ๊ฐ€์ง€ Sub-Agent๋ฅผ ํ™œ์šฉํ•˜์—ฌ Context ํšจ์œจ์„ฑ์„ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค. Summarizer Sub-Agent๋Š” ๊ฐ€์ ธ์˜จ ์›นํŽ˜์ด์ง€๋ฅผ ์š”์•ฝํ•˜์—ฌ K-EXAONE์ด ๊ธธ๊ณ  ๋…ธ์ด์ฆˆ๊ฐ€ ๋งŽ์€ ์›น ํ…์ŠคํŠธ๋ฅผ ์ง์ ‘ ์ฒ˜๋ฆฌํ•˜์ง€ ์•Š๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. Trajectory Compressor๋Š” Tool-Calling ์ด๋ ฅ์ด ์‚ฌ์ „ ์ •์˜๋œ ๋‹จ๊ณ„ ์ˆ˜๋ฅผ ์ดˆ๊ณผํ•˜๋ฉด, ์ „์ฒด ์ƒํ˜ธ์ž‘์šฉ์„ Tool ์ถœ๋ ฅ์˜ ํ•ต์‹ฌ ์‚ฌ์‹ค๊ณผ ๋‚จ์€ ์กฐ์‚ฌ ์งˆ๋ฌธ์„ ๋‹ด์€ ๋‹จ์ผ JSON ๊ตฌ์กฐํ™” ๋ ˆ์ฝ”๋“œ๋กœ ์••์ถ•ํ•ฉ๋‹ˆ๋‹ค. ์ด ์„ค๊ณ„๋Š” ์ค‘๋ณต๋œ Tool ๊ฒฐ๊ณผ๊ฐ€ K-EXAONE์— ๋ฐ˜๋ณต ๋…ธ์ถœ๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค. ๋‘ Sub-Agent ๋ชจ๋‘ ์ถ”๋ก  ์‹œ K-EXAONE๊ณผ ๋™์ผํ•œ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ๋กœ ๊ตฌํ˜„๋ฉ๋‹ˆ๋‹ค.

Reinforcement Learning: AGAPO

์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๊ฐ•ํ™”ํ•˜๊ธฐ ์œ„ํ•ด Verifiable Rewards๋ฅผ ์‚ฌ์šฉํ•œ Reinforcement Learning(RL)์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ˆ˜ํ•™, ์ฝ”๋“œ, STEM, Instruction Following์„ ์•„์šฐ๋ฅด๋Š” Multi-Task ์„ค์ •์—์„œ ํ•™์Šตํ•˜๋ฉฐ, ๊ฒ€์ฆ์—๋Š” Rule-Based Verifier์™€ LLM-as-a-Judge์˜ ์กฐํ•ฉ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ตœ์ ํ™”์—๋Š” Off-Policy Policy Gradient์™€ Truncated Importance Sampling์„ ์‚ฌ์šฉํ•˜๋Š” AGAPO๋ฅผ ์ฑ„ํƒํ•ฉ๋‹ˆ๋‹ค. RL ๋ชฉ์ ํ•จ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค. ์งˆ๋ฌธ qโˆผP(Q)q \sim P(Q)qโˆผP(Q)์— ๋Œ€ํ•ด Rollout Policy ฯ€ฮธrollout\pi_{\theta_{\text{rollout}}}ฯ€ฮธrolloutโ€‹โ€‹์—์„œ GGG๊ฐœ์˜ ํ›„๋ณด ์‘๋‹ต O={o1,โ€ฆ,oG}O = {o_1, \ldots, o_G}O={o1โ€‹,โ€ฆ,oGโ€‹}๋ฅผ ์ƒ˜ํ”Œ๋งํ•˜๊ณ , ๊ฐ ์‘๋‹ต์— ๊ฒ€์ฆ ๊ฐ€๋Šฅํ•œ ๋ณด์ƒ riโˆˆ[0,1]r_i \in [0, 1]riโ€‹โˆˆ[0,1]์„ ๋ถ€์—ฌํ•ฉ๋‹ˆ๋‹ค.

JAGAPO(ฮธ)=EqโˆผP(Q),{oi}i=1Gโˆผฯ€ฮธrollout(Oโˆฃq)[1Gโˆ‘i=1G(1โˆฃoiโˆฃโˆ‘t=1โˆฃoiโˆฃsg(minโก(ฯi,t,ฯต))Aglobal,ilogโกฯ€ฮธ(oi,tโˆฃq,oi,<t))]J_{\text{AGAPO}}(\theta) = \mathbb{E}_{q \sim P(Q), {o_i}_{i=1}^{G} \sim \pi_{\theta_{\text{rollout}}}(O q)}\left[\frac{1}{G}\sum_{i=1}^{G}\left(\frac{1}{ o_i }\sum_{t=1}^{ o_i } \text{sg}\left(\min(\rho_{i,t}, \epsilon)\right) A_{\text{global},i} \log\pi_\theta(o_{i,t} q, o_{i,<t})\right)\right]JAGAPOโ€‹(ฮธ)=EqโˆผP(Q),{oiโ€‹}i=1Gโ€‹โˆผฯ€ฮธrolloutโ€‹โ€‹(Oโˆฃq)โ€‹[G1โ€‹โˆ‘i=1Gโ€‹(โˆฃoiโ€‹โˆฃ1โ€‹โˆ‘t=1โˆฃoiโ€‹โˆฃโ€‹sg(min(ฯi,tโ€‹,ฯต))Aglobal,iโ€‹logฯ€ฮธโ€‹(oi,tโ€‹โˆฃq,oi,<tโ€‹))]

์—ฌ๊ธฐ์„œ Importance Ratio์™€ Advantage๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค.

ฯi,t=ฯ€ฮธ(oi,tโˆฃq,oi,<t)ฯ€ฮธrollout(oi,tโˆฃq,oi,<t)\rho_{i,t} = \frac{\pi_\theta(o_{i,t} q, o_{i,<t})}{\pi_{\theta_{\text{rollout}}}(o_{i,t} q, o_{i,<t})}ฯi,tโ€‹=ฯ€ฮธrolloutโ€‹โ€‹(oi,tโ€‹โˆฃq,oi,<tโ€‹)ฯ€ฮธโ€‹(oi,tโ€‹โˆฃq,oi,<tโ€‹)โ€‹

Agroup,i=riโˆ’1Gโˆ’1โˆ‘jโ‰ irj,Aglobal,i=Agroup,iโˆ’mean({Agroup,k}k)std({Agroup,k}k)A_{\text{group},i} = r_i - \frac{1}{G-1}\sum_{j \neq i} r_j, \quad A_{\text{global},i} = \frac{A_{\text{group},i} - \text{mean}({A_{\text{group},k}}_k)}{\text{std}({A_{\text{group},k}}_k)}Agroup,iโ€‹=riโ€‹โˆ’Gโˆ’11โ€‹โˆ‘j๎€ โ€‹=iโ€‹rjโ€‹,Aglobal,iโ€‹=std({Agroup,kโ€‹}kโ€‹)Agroup,iโ€‹โˆ’mean({Agroup,kโ€‹}kโ€‹)โ€‹

Group-Level Advantage๋ฅผ ๋จผ์ € ๊ณ„์‚ฐํ•˜์—ฌ ๊ทธ๋ฃน ๋‚ด ์ƒ๋Œ€์  ๋ณด์ƒ ์‹ ํ˜ธ๋ฅผ ํฌ์ฐฉํ•œ ๋’ค, Global Normalization์„ ์ ์šฉํ•˜์—ฌ ๋ฐฐ์น˜ ์ˆ˜์ค€์˜ ์ •๋ณด๋ฅผ ๋ฐ˜์˜ํ•ฉ๋‹ˆ๋‹ค. ํ•ต์‹ฌ ์„ค๊ณ„ ๊ฒฐ์ •์œผ๋กœ๋Š” Zero-Variance Filtering(์ƒ˜ํ”Œ๋ง๋œ Rollout์ด ๋ชจ๋‘ ๋™์ผํ•œ ๋ณด์ƒ์„ ๋ฐ›๋Š” ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ œ๊ฑฐํ•˜์—ฌ Advantage๊ฐ€ 0์ด ๋˜๋Š” ๊ฒฝ์šฐ๋ฅผ ๋ฐฉ์ง€), KL Penalty ์ œ๊ฑฐ(์„ฑ๋Šฅ ํ–ฅ์ƒ๊ณผ ๋ถˆํ•„์š”ํ•œ ์—ฐ์‚ฐ ๋ฐฉ์ง€), MoE Router ๋™๊ฒฐ(RL ํ•™์Šต ์ „ ๊ณผ์ •์—์„œ ๋ผ์šฐํ„ฐ๋ฅผ ๊ณ ์ •)์ด ์žˆ์Šต๋‹ˆ๋‹ค.

Preference Learning: GROUPER

RL ํ•™์Šต ํ›„์—๋Š” Human Preference์™€์˜ ์ •๋ ฌ์„ ์œ„ํ•œ Preference Learning ๋‹จ๊ณ„๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋‹จ๊ณ„์—์„œ๋Š” ์ถ”๋ก  ์„ฑ๋Šฅ์„ ๋ณด์กดํ•˜๋ฉด์„œ Chat, Safety, Instruction Following, Agentic Tool Use, Creative Writing ๋“ฑ ์ผ๋ฐ˜ ์ •๋ ฌ ๋„๋ฉ”์ธ์— ์ง‘์ค‘ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด SimPER์˜ ๊ฐœ์„  ๋ณ€ํ˜•์ธ GROUPER(Group-wise SimPER)๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

GRPO์—์„œ ์˜๊ฐ์„ ๋ฐ›์•„, ๊ฐ ์ฟผ๋ฆฌ์— ๋Œ€ํ•ด ์—ฌ๋Ÿฌ ์‘๋‹ต์„ ์ƒ˜ํ”Œ๋งํ•˜๊ณ  Group-wise Advantage๋กœ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ์‘๋‹ต์˜ Preference Reward๋Š” Rule-Based Reward์™€ ๋‹ค์ฐจ์› ํ‰๊ฐ€๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” Rubric-Based Generative Reward์˜ ์กฐํ•ฉ์œผ๋กœ ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค. ๋ชฉ์ ํ•จ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

LGROUPER(ฮธ)=โˆ’ExโˆผP(X),{oi}i=1Gโˆผฯ€ฮธinit(Oโˆฃx)[1Gโˆ‘i=1G(Apref,iexpโก(1โˆฃoiโˆฃlogโกฯ€ฮธ(oiโˆฃx)))]\mathcal{L}_{\text{GROUPER}}(\theta) = -\mathbb{E}_{x \sim P(X), {o_i}_{i=1}^{G} \sim \pi_{\theta_{\text{init}}}(O x)}\left[\frac{1}{G}\sum_{i=1}^{G}\left(A_{\text{pref},i} \exp\left(\frac{1}{ o_i }\log\pi_\theta(o_i x)\right)\right)\right]LGROUPERโ€‹(ฮธ)=โˆ’ExโˆผP(X),{oiโ€‹}i=1Gโ€‹โˆผฯ€ฮธinitโ€‹โ€‹(Oโˆฃx)โ€‹[G1โ€‹โˆ‘i=1Gโ€‹(Apref,iโ€‹exp(โˆฃoiโ€‹โˆฃ1โ€‹logฯ€ฮธโ€‹(oiโ€‹โˆฃx)))]

Advantage ๊ณ„์‚ฐ์€ Preference Reward๋ฅผ ํ‘œ์ค€ํ™”ํ•œ ๋’ค [โˆ’1,1][-1, 1][โˆ’1,1] ๋ฒ”์œ„๋กœ ์Šค์ผ€์ผ๋งํ•ฉ๋‹ˆ๋‹ค.

zi=rpref,iโˆ’mean({rpref,j}j=1G)std({rpref,j}j=1G),Apref,i=2โ‹…ziโˆ’minโก({zj}j=1G)maxโก({zj}j=1G)โˆ’minโก({zj}j=1G)โˆ’1โˆˆ[โˆ’1,1]z_i = \frac{r_{\text{pref},i} - \text{mean}({r_{\text{pref},j}}_{j=1}^{G})}{\text{std}({r_{\text{pref},j}}_{j=1}^{G})}, \quad A_{\text{pref},i} = 2 \cdot \frac{z_i - \min({z_j}_{j=1}^{G})}{\max({z_j}_{j=1}^{G}) - \min({z_j}_{j=1}^{G})} - 1 \in [-1, 1]ziโ€‹=std({rpref,jโ€‹}j=1Gโ€‹)rpref,iโ€‹โˆ’mean({rpref,jโ€‹}j=1Gโ€‹)โ€‹,Apref,iโ€‹=2โ‹…max({zjโ€‹}j=1Gโ€‹)โˆ’min({zjโ€‹}j=1Gโ€‹)ziโ€‹โˆ’min({zjโ€‹}j=1Gโ€‹)โ€‹โˆ’1โˆˆ[โˆ’1,1]

GROUPER๋Š” SimPER์˜ Hyperparameter-Free ํŠน์„ฑ๊ณผ GRPO์˜ Group-wise Sampling์„ ๊ฒฐํ•ฉํ•˜์—ฌ, ์ผ๋ฐ˜ ๋„๋ฉ”์ธ์—์„œ์˜ ์ •๋ ฌ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.

Data Compliance

AI ๋ชจ๋ธ ๊ฐœ๋ฐœ์— ํ•„์š”ํ•œ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์˜ ์ˆ˜์ง‘๊ณผ ํ™œ์šฉ ๊ณผ์ •์—์„œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ์ €์ž‘๊ถŒ ์นจํ•ด, ์ง€์ ์žฌ์‚ฐ๊ถŒ ์นจํ•ด, ๊ฐœ์ธ์ •๋ณด ๋ณดํ˜ธ ์œ„๋ฐ˜ ๋“ฑ์˜ ๋ฒ•์  ๋ฆฌ์Šคํฌ๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด, LG AI Research๋Š” ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘, AI ๋ชจ๋ธ ํ•™์Šต, ์ •๋ณด ์ œ๊ณต์˜ ์ „ ๊ณผ์ •์— ๊ฑธ์ณ AI Compliance ๋ฆฌ๋ทฐ๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

ํ‰๊ฐ€ ๊ฒฐ๊ณผ

๋ฒค์น˜๋งˆํฌ์™€ ํ‰๊ฐ€ ์„ค์ •

K-EXAONE์€ 9๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ์— ๊ฑธ์นœ ํฌ๊ด„์  ๋ฒค์น˜๋งˆํฌ ์Šค์œ„ํŠธ๋กœ ํ‰๊ฐ€๋ฉ๋‹ˆ๋‹ค. World Knowledge(MMLU-PRO, GPQA-DIAMOND, HUMANITYโ€™S LAST EXAM), Math(IMO-ANSWERBENCH, AIME 2025, HMMT NOV 2025), Coding/Agentic Coding(LIVECODEBENCH PRO, LIVECODEBENCH V6, TERMINAL-BENCH 2.0, SWE-BENCH VERIFIED), Agentic Tool Use(ฯ„2\tau^2ฯ„2-BENCH, BROWSECOMP), Instruction Following(IFBENCH, IFEVAL), Long Context Understanding(AA-LCR, OPENAI-MRCR), Korean(KMMLU-PRO, KOBALT, CLICK, HRM8K, KO-LONGBENCH), Multilinguality(MMMLU, WMT24++), Safety(WILDJAILBREAK, KGC-SAFETY)๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

ํ‰๊ฐ€ ์„ค์ •์€ Temperature 1.0, Top-p 0.95์ด๋ฉฐ, Long Context Understanding ๋ฒค์น˜๋งˆํฌ์—๋Š” 160K, ๋‚˜๋จธ์ง€์—๋Š” 128K Context Length๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ถ”๋ก  ์‹œ MTP๋Š” ๋น„ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค.

Reasoning ๋ชจ๋“œ ์ฃผ์š” ๊ฒฐ๊ณผ

๋น„๊ต ๋Œ€์ƒ์€ EXAONE 4.0(32B Dense), gpt-oss-120b(117B MoE, 5.1B Active), Qwen3-235B-A22B-Thinking-2507(235B MoE, 22B Active), DeepSeek-V3.2(671B MoE, 37B Active)์ž…๋‹ˆ๋‹ค.

๋ฒค์น˜๋งˆํฌ K-EXAONE EXAONE 4.0 gpt-oss-120b Qwen3-235B DeepSeek-V3.2
MMLU-PRO 83.8 81.8 80.7 84.4 85.0
AIME 2025 92.8 85.3 92.5 92.3 93.1
LiveCodeBench V6 80.7 66.7 81.9 74.1 79.4
ฯ„2\tau^2ฯ„2-Bench (weighted) 73.2 46.8 63.9 58.6 79.0
IFBench 67.3 36.0 69.5 52.6 62.5
KoBALT 61.8 25.4 54.3 56.1 62.7
KGC-SAFETY 96.1 58.0 92.5 66.2 73.0

์ˆ˜ํ•™ ์ถ”๋ก ์—์„œ K-EXAONE์€ AIME 2025์—์„œ 92.8์„ ๋‹ฌ์„ฑํ•˜์—ฌ gpt-oss-120b(92.5)์™€ Qwen3(92.3)์„ ์ƒํšŒํ•˜๊ณ , 37B ํ™œ์„ฑ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ง„ DeepSeek-V3.2(93.1)์— ๊ทผ์ ‘ํ•ฉ๋‹ˆ๋‹ค. 23B ํ™œ์„ฑ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์ด ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•œ ๊ฒƒ์€ ํŒŒ๋ผ๋ฏธํ„ฐ ํšจ์œจ์„ฑ ๋ฉด์—์„œ ์ธ์ƒ์ ์ž…๋‹ˆ๋‹ค.

Agentic Tool Use(ฯ„2\tau^2ฯ„2-Bench)์—์„œ๋Š” ๊ฐ€์ค‘ ํ‰๊ท  73.2๋กœ gpt-oss-120b(63.9)์™€ Qwen3(58.6)์„ ํฌ๊ฒŒ ์ƒํšŒํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” Synthetic Tool Environment ๊ธฐ๋ฐ˜ ํ•™์Šต ์ „๋žต์ด ์œ ํšจํ–ˆ์Œ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ DeepSeek-V3.2(79.0)์—๋Š” ๋ฏธ์น˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค.

Instruction Following์—์„œ K-EXAONE์€ IFBench 67.3, IFEVAL 89.7์„ ๊ธฐ๋กํ•˜์—ฌ ๋Œ€๋ถ€๋ถ„์˜ ๋น„๊ต ๋ชจ๋ธ์„ ๋Šฅ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

์•ˆ์ „์„ฑ(KGC-SAFETY)์—์„œ๋Š” 96.1๋กœ ๋ชจ๋“  ๋น„๊ต ๋ชจ๋ธ์„ ์••๋„์ ์œผ๋กœ ๋Šฅ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. gpt-oss-120b(92.5)์™€๋„ 3.6p ์ฐจ์ด๋ฅผ ๋ณด์ด๋ฉฐ, Qwen3(66.2)์™€ DeepSeek-V3.2(73.0)์™€๋Š” 20~30p ์ด์ƒ์˜ ๊ฒฉ์ฐจ๋ฅผ ๋ณด์ž…๋‹ˆ๋‹ค.

ํ•œ๊ตญ์–ด ๋ฐ ๋‹ค๊ตญ์–ด ์„ฑ๋Šฅ

ํ•œ๊ตญ์–ด ๋ฒค์น˜๋งˆํฌ์—์„œ K-EXAONE์€ Open-Weight Reasoning ๋ชจ๋ธ ์ค‘ ๊ฐ•ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค. CLICK(์–ธ์–ดยท๋ฌธํ™” ์—ญ๋Ÿ‰) 83.9, HRM8K(์˜ฌ๋ฆผํ”ผ์•„๋“œ๊ธ‰ ์ˆ˜ํ•™ ์ถ”๋ก ) 90.9, KO-LONGBENCH(Long-Context ์ดํ•ด) 86.8์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ KMMLU-PRO(67.3)์—์„œ Qwen3(71.6)๊ณผ DeepSeek-V3.2(72.1)์— ๋’ค์ฒ˜์ง€๋Š” ์ ์€ ์ฃผ๋ชฉํ•  ๋งŒํ•ฉ๋‹ˆ๋‹ค. ํ•œ๊ตญ์–ด ํŠนํ™” ๋ชจ๋ธ์ž„์—๋„ ํ•œ๊ตญ์–ด ์ „๋ฌธ ์ง€์‹ ๋ฒค์น˜๋งˆํฌ์—์„œ ์ตœ๊ณ ๊ฐ€ ์•„๋‹Œ ๊ฒƒ์€ ํ–ฅํ›„ ๊ฐœ์„ ์ด ํ•„์š”ํ•œ ์˜์—ญ์ž…๋‹ˆ๋‹ค.

๋‹ค๊ตญ์–ด ํ‰๊ฐ€์—์„œ๋Š” MMMLU 85.7, WMT24++ 90.5๋ฅผ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค. EXAONE 4.0 ๋Œ€๋น„ ๋ชจ๋“  ์–ธ์–ด์—์„œ ๊ณ ๋ฅด๊ฒŒ ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋˜์–ด, ํŠน์ • ์–ธ์–ด์˜ ๋‘๋“œ๋Ÿฌ์ง„ ์•ฝํ™”๋‚˜ ์ง€๋ฐฐ ์—†์ด ๊ท ํ˜• ์žกํžŒ ๋‹ค๊ตญ์–ด ์—ญ๋Ÿ‰์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

Non-Reasoning ๋ชจ๋“œ ํŠน๊ธฐ์‚ฌํ•ญ

Non-Reasoning ๋ชจ๋“œ์—์„œ ํŠนํžˆ ์ฃผ๋ชฉํ•  ๊ฒฐ๊ณผ๋Š” Long Context Understanding์ž…๋‹ˆ๋‹ค. K-EXAONE์€ AA-LCR 45.2, OPENAI-MRCR 60.9๋ฅผ ๋‹ฌ์„ฑํ•˜์—ฌ, Qwen3(31.2, 42.8)๊ณผ DeepSeek-V3.2(32.0, 42.4)๋ฅผ ๋Œ€ํญ ์ƒํšŒํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” Hybrid Attention ๊ตฌ์กฐ์™€ 2๋‹จ๊ณ„ Context Extension ์ „๋žต์ด Non-Reasoning ํ™˜๊ฒฝ์—์„œ ํŠนํžˆ ํšจ๊ณผ์ ์ž„์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

EXAONE 4.0 ๋Œ€๋น„ ๊ฐœ์„ ํญ

EXAONE 4.0(32B Dense)์—์„œ K-EXAONE(236B MoE, 23B Active)์œผ๋กœ์˜ ์ „ํ™˜์—์„œ ๊ฐ€์žฅ ๊ทน์ ์ธ ๊ฐœ์„ ์ด ๋‚˜ํƒ€๋‚œ ์˜์—ญ์€ ฯ„2\tau^2ฯ„2-Bench Telecom(23.7 โ†’ 73.5, +49.8p), KGC-SAFETY(58.0 โ†’ 96.1, +38.1p), KoBALT(25.4 โ†’ 61.8, +36.4p), IFBench(36.0 โ†’ 67.3, +31.3p)์ž…๋‹ˆ๋‹ค. ์ด๋Š” MoE ์Šค์ผ€์ผ๋ง๊ณผ Post-Training ํŒŒ์ดํ”„๋ผ์ธ ๊ฐœ์„ ์˜ ๋ณตํ•ฉ์  ํšจ๊ณผ๋กœ ํ•ด์„๋ฉ๋‹ˆ๋‹ค.

๊ฐœ์„  ์—ฌ์ง€ ์˜์—ญ

์‹คํ—˜ ๊ฒฐ๊ณผ์—์„œ ๋ช‡ ๊ฐ€์ง€ ๊ฐœ์„  ์—ฌ์ง€๊ฐ€ ํ™•์ธ๋ฉ๋‹ˆ๋‹ค. Agentic Coding(SWE-BENCH VERIFIED 49.4)์—์„œ DeepSeek-V3.2(73.1)๊ณผ gpt-oss-120b(62.4)์— ๋น„ํ•ด ์ƒ๋‹นํ•œ ๊ฒฉ์ฐจ๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. HUMANITYโ€™S LAST EXAM(13.6)์—์„œ DeepSeek-V3.2(25.1)์˜ ์•ฝ ์ ˆ๋ฐ˜ ์ˆ˜์ค€์œผ๋กœ, ์ตœ์ƒ์œ„ ๋‚œ์ด๋„ ์ง€์‹ ์ถ”๋ก ์—์„œ ํ•œ๊ณ„๋ฅผ ๋ณด์ž…๋‹ˆ๋‹ค.

์•ˆ์ „์„ฑ ํ”„๋ ˆ์ž„์›Œํฌ: K-AUT์™€ KGC-SAFETY

K-EXAONE์˜ ๊ฐ€์žฅ ๋‘๋“œ๋Ÿฌ์ง„ ์ฐจ๋ณ„์  ์ค‘ ํ•˜๋‚˜๋Š” ํ•œ๊ตญ ์‚ฌํšŒ๋ฌธํ™”์  ๋งฅ๋ฝ์„ ์ฒด๊ณ„์ ์œผ๋กœ ๋ฐ˜์˜ํ•œ ์•ˆ์ „์„ฑ ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ์„œ๊ตฌ ์ค‘์‹ฌ AI ์œ„ํ—˜ ๋ถ„๋ฅ˜ ์ฒด๊ณ„๊ฐ€ ํ•œ๊ตญ ์‚ฌํšŒ์˜ ๋ฌธํ™”์  ๋ฏผ๊ฐ์„ฑ๊ณผ ๋งฅ๋ฝ ํŠนํ™” ์š”๊ตฌ๋ฅผ ์ถฉ๋ถ„ํžˆ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•˜๋Š” ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด, Korea-Augmented Universal Taxonomy(K-AUT)๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

K-AUT๋Š” 4๊ฐœ ์ฃผ์š” ๋„๋ฉ”์ธ๊ณผ 226๊ฐœ ์„ธ๋ถ€ ์œ„ํ—˜ ์˜์—ญ์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. Universal Human Values(55๊ฐœ)๋Š” UN ํ—Œ์žฅ๊ณผ ๊ตญ์ œ ์ธ๊ถŒ ๊ธฐ์ค€์— ๊ธฐ๋ฐ˜ํ•œ ์ƒ๋ช…ยท์กด์—„ยท๊ธฐ๋ณธ๊ถŒ ์œ„ํ˜‘์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. Social Safety(75๊ฐœ)๋Š” ์‚ฌํšŒ ์งˆ์„œ ๊ต๋ž€์ด๋‚˜ ์–‘๊ทนํ™” ์‹ฌํ™”๋ฅผ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. Korean Sensitivity(60๊ฐœ)๋Š” ํ—Œ๋ฒ•์  ๊ฐ€์น˜, ๊ตญ๋‚ด๋ฒ•(๊ตญ๊ฐ€๋ณด์•ˆ๋ฒ• ๋“ฑ), ๊ฒ€์ฆ๋œ ์—ญ์‚ฌ์  ํ•ฉ์˜์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ํ•œ๊ตญ์˜ ๋ฌธํ™”์ ยท์—ญ์‚ฌ์ ยท์ง€์ •ํ•™์  ๋งฅ๋ฝ์—์„œ์˜ ๋ฏผ๊ฐ ์ด์Šˆ๋ฅผ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค. Future Risk(36๊ฐœ)๋Š” ๊ตญ์ œ AI ์œค๋ฆฌ ์›์น™๊ณผ ์˜ˆ์ธก์  ์œ„ํ—˜ ์—ฐ๊ตฌ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์‹ ๊ธฐ์ˆ ๋กœ ์ธํ•œ ์ƒˆ๋กœ์šด ์œ„ํ˜‘์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค.

์ด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ KGC-SAFETY ๋ฒค์น˜๋งˆํฌ๋Š” 226๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ์—์„œ ๊ฐ 10๊ฐœ์”ฉ ์ด 2,260๊ฐœ ํ…Œ์ŠคํŠธ ์ธ์Šคํ„ด์Šค๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ๋‹ค๊ตญ์–ด(6๊ฐœ ์–ธ์–ด), ๋ฉ€ํ‹ฐํ„ด, ์ ๋Œ€์ , ์ผ๋ฐ˜ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ํ‰๊ฐ€๋Š” LLM-as-a-Judge ํ”„๋ ˆ์ž„์›Œํฌ๋กœ ์ˆ˜ํ–‰๋˜๋ฉฐ, ๊ฐ ํ…Œ์ŠคํŠธ ์ผ€์ด์Šค์˜ ์•ˆ์ „ ์—ฌ๋ถ€๋ฅผ ์ด์ง„ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค.

KGC-SAFETY ์„ธ๋ถ€ ๊ฒฐ๊ณผ์—์„œ ๋Œ€๋ถ€๋ถ„์˜ ๋ชจ๋ธ์ด Universal Human Values์™€ Social Safety์—์„œ ์ƒ๋Œ€์ ์œผ๋กœ ๋†’์€ Safe Rate๋ฅผ ๋ณด์ด์ง€๋งŒ, Future Risk์™€ Korean Sensitivity์—์„œ๋Š” ๋‚ฎ์€ ๊ฒฝํ–ฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค. K-EXAONE์€ ์ „ ๋„๋ฉ”์ธ์—์„œ 94% ์ด์ƒ์˜ Safe Rate๋ฅผ ์œ ์ง€ํ•˜์—ฌ(Universal Human Values 97.5, Social Safety 96.9, Korean Sensitivity 94.3, Future Risk 95.0), K-AUT ๊ธฐ๋ฐ˜์˜ ํ•œ๊ตญ ํŠนํ™” ์•ˆ์ „์„ฑ ํ•™์Šต์ด ํšจ๊ณผ์ ์ด์—ˆ์Œ์„ ์ž…์ฆํ•ฉ๋‹ˆ๋‹ค.

๋…ผ๋ฌธ์—์„œ๋Š” ์ด ์ ‘๊ทผ๋ฒ•์ด ๋‹ค๋ฅธ ๊ตญ๊ฐ€์˜ Sovereign AI ๊ฐœ๋ฐœ ์‹œ modularํ•˜๊ณ  scalableํ•œ blueprint์œผ๋กœ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์œ„์น˜๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ๋ณดํŽธ์  ์œค๋ฆฌ์— ์ง€์—ญ์  ํŠน์ˆ˜์„ฑ์„ ์ฒด๊ณ„์ ์œผ๋กœ ํ†ตํ•ฉํ•˜๋Š” K-AUT์˜ ๊ตฌ์กฐ๋Š”, ๊ฐ๊ตญ์˜ ๋ฌธํ™”์  ๋งฅ๋ฝ์— ๋งž๊ฒŒ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์„ค๊ณ„๋ฅผ ๊ฐ–์ถ”๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•œ๊ณ„์™€ ๋ฐฐํฌ

K-EXAONE์€ ๋ชจ๋“  LLM๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋ช‡ ๊ฐ€์ง€ ํ•œ๊ณ„๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค. ๊ฐœ์ธ์ , ์œ ํ•ดํ•œ, ํŽธํ–ฅ๋œ ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๋Š” ๋ถ€์ ์ ˆํ•œ ์‘๋‹ต์ด ์ƒ์„ฑ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์—ฐ๋ นยท์„ฑ๋ณ„ยท์ธ์ข… ๋“ฑ๊ณผ ๊ด€๋ จ๋œ ํŽธํ–ฅ๋œ ์‘๋‹ต์ด ๋‚˜์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ํ†ต๊ณ„์— ํฌ๊ฒŒ ์˜์กดํ•˜์—ฌ ์˜๋ฏธ์ ยท๊ตฌ๋ฌธ์ ์œผ๋กœ ๋ถ€์ •ํ™•ํ•œ ๋ฌธ์žฅ์ด ์ƒ์„ฑ๋  ์ˆ˜ ์žˆ๊ณ , ์ตœ์‹  ์ •๋ณด๋ฅผ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•˜์—ฌ ๊ฑฐ์ง“์ด๊ฑฐ๋‚˜ ๋ชจ์ˆœ๋œ ์‘๋‹ต์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฐฐํฌ ์ธก๋ฉด์—์„œ K-EXAONE์€ ๋น„๋…์ ์ , ๋น„์–‘๋„์ , ์ „ ์„ธ๊ณ„์ , ์ทจ์†Œ ๋ถˆ๊ฐ€ ๋ผ์ด์„ ์Šค๋กœ ์ƒ์—…์ ยท๋น„์ƒ์—…์  ๋ชฉ์ ์˜ ์ ‘๊ทผ, ๋‹ค์šด๋กœ๋“œ, ์„ค์น˜, ์ˆ˜์ •, ์‚ฌ์šฉ, ๋ฐฐํฌ, ํŒŒ์ƒ ์ €์ž‘๋ฌผ ์ƒ์„ฑ์ด ํ—ˆ์šฉ๋ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ ์ƒ์—…์  ๋ชฉ์ ์˜ ๋ฐฐํฌ, ์„œ๋ธŒ๋ผ์ด์„ ์‹ฑ, ๋˜๋Š” ์ œ3์ž ์ œ๊ณต์€ ๋ณ„๋„ ํ•ฉ์˜๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๊ฒฐ๋ก 

K-EXAONE์€ ํ•œ๊ตญ์˜ AI ์ธํ”„๋ผ ์ œ์•ฝ ์†์—์„œ ์ •๋ถ€-๋ฏผ๊ฐ„ ํ˜‘๋ ฅ์„ ํ†ตํ•ด ๊ธ€๋กœ๋ฒŒ ๊ฒฝ์Ÿ๋ ฅ ์žˆ๋Š” ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•  ์ˆ˜ ์žˆ์Œ์„ ์‹ค์ฆํ•œ Sovereign AI ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. MoE ์•„ํ‚คํ…์ฒ˜๋ฅผ ํ†ตํ•œ ํšจ์œจ์  ์Šค์ผ€์ผ๋ง(236B/23B), Hybrid Attention ๊ธฐ๋ฐ˜์˜ 256K Long-Context ์ฒ˜๋ฆฌ, SuperBPE Tokenizer์˜ ํ‰๊ท  30% ํšจ์œจ ํ–ฅ์ƒ, AGAPO RL๊ณผ GROUPER Preference Learning์„ ํ†ตํ•œ ์ •๋ ฌ, ๊ทธ๋ฆฌ๊ณ  K-AUT ํ”„๋ ˆ์ž„์›Œํฌ ๊ธฐ๋ฐ˜์˜ ํ•œ๊ตญ ์‚ฌํšŒ๋ฌธํ™” ํŠนํ™” ์•ˆ์ „์„ฑ์ด๋ผ๋Š” ๋‹ค์ธต์  ํ˜์‹ ์ด ์กฐํ™”๋ฅผ ์ด๋ฃจ๋ฉฐ, ์ถ”๋ก , Agentic, ๋‹ค๊ตญ์–ด, ์•ˆ์ „์„ฑ ๋“ฑ ๋‹ค์–‘ํ•œ ํ‰๊ฐ€์—์„œ ์œ ์‚ฌ ๊ทœ๋ชจ Open-Weight ๋ชจ๋ธ๋“ค๊ณผ ๋Œ€๋“ฑํ•˜๊ฑฐ๋‚˜ ๊ทธ ์ด์ƒ์˜ ์„ฑ๋Šฅ์„ ์ž…์ฆํ•ฉ๋‹ˆ๋‹ค.

ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์œผ๋กœ๋Š” K-EXAONE์˜ MoE ๋ผ์šฐํŒ… ๋™์ž‘ ๋ถ„์„(์–ด๋–ค Expert๊ฐ€ ์–ด๋–ค ์–ธ์–ด/๋„๋ฉ”์ธ์— ํŠนํ™”๋˜๋Š”์ง€์— ๋Œ€ํ•œ Ablation Study), Agentic Coding ๋Šฅ๋ ฅ ๊ฐ•ํ™”, ๊ทธ๋ฆฌ๊ณ  GROUPER์™€ ๊ธฐ์กด RLHF/DPO ๊ณ„์—ด Preference Learning ๋ฐฉ๋ฒ•๋ก  ๊ฐ„์˜ ์‹ฌ์ธต ๋น„๊ต๊ฐ€ ํฅ๋ฏธ๋กœ์šด ํƒ๊ตฌ ์ฃผ์ œ๊ฐ€ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ฝ์–ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.