[๊ฐ•์˜๋…ธํŠธ] RAG From Scratch : Query Translation

Posted by Euisuk's Dev Log on September 14, 2024

[๊ฐ•์˜๋…ธํŠธ] RAG From Scratch : Query Translation

์›๋ณธ ๊ฒŒ์‹œ๊ธ€: https://velog.io/@euisuk-chung/RAG-From-Scratch-5-9

  • ํ•ด๋‹น ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŠธ๋Š” RAG From Scratch : Coursework ๊ฐ•์˜ ํŒŒํŠธ 5 - 9 ๋‚ด์šฉ์„ ๋‹ค๋ฃจ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
๋น„๋””์˜ค ์š”์•ฝ ๊ฐ•์˜ ๋งํฌ ์Šฌ๋ผ์ด๋“œ
Part 5 (๋‹ค์ค‘ ์ฟผ๋ฆฌ) ๋‹ค์–‘ํ•œ ๋ฌธ์„œ ๊ฒ€์ƒ‰์„ ์œ„ํ•ด ์ฟผ๋ฆฌ ์žฌ์ž‘์„ฑ ๊ธฐ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ๐Ÿ“Œ ๊ฐ•์˜ ๐Ÿ“– ์Šฌ๋ผ์ด๋“œ
Part 6 (RAG Fusion) ์—ฌ๋Ÿฌ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ํ–ฅ์ƒ๋œ ๋žญํ‚น์„ ์ œ๊ณตํ•˜๋Š” RAG Fusion์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ๐Ÿ“Œ ๊ฐ•์˜ ๐Ÿ“– ์Šฌ๋ผ์ด๋“œ
Part 7 (๋ถ„ํ•ด) ๋ณต์žกํ•œ ์งˆ๋ฌธ์„ ์„ธ๋ถ„ํ™”๋œ ํ•˜์œ„ ์งˆ๋ฌธ์œผ๋กœ ๋‚˜๋ˆ„์–ด ์ƒ์„ธํ•œ ๋‹ต๋ณ€์„ ์ œ๊ณตํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋…ผ์˜ํ•ฉ๋‹ˆ๋‹ค. ๐Ÿ“Œ ๊ฐ•์˜ ๐Ÿ“– ์Šฌ๋ผ์ด๋“œ
Part 8 (๋‹จ๊ณ„์  ํ›„ํ‡ด) ๊ทผ๋ณธ์ ์ธ ์ดํ•ด๋ฅผ ์ด๋Œ์–ด๋‚ด๋Š” ์ถ”์ƒ์  ์งˆ๋ฌธ์„ ์ƒ์„ฑํ•˜๋Š” ๋‹จ๊ณ„์  ํ›„ํ‡ด ํ”„๋กฌํ”„ํŒ…์„ ํƒ๊ตฌํ•ฉ๋‹ˆ๋‹ค. ๐Ÿ“Œ ๊ฐ•์˜ ๐Ÿ“– ์Šฌ๋ผ์ด๋“œ
Part 9 (HyDE) ์ธ๋ฑ์Šค ๋ฌธ์„œ์™€ ๋” ์ž˜ ์ผ์น˜ํ•˜๋„๋ก ๊ฐ€์„ค์  ๋ฌธ์„œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” HyDE ๊ธฐ๋ฒ•์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ๐Ÿ“Œ ๊ฐ•์˜ ๐Ÿ“– ์Šฌ๋ผ์ด๋“œ

Part 5 (๋‹ค์ค‘ ์ฟผ๋ฆฌ)

  • ํ•ด๋‹น ๊ฐ•์˜๋Š” RAG(Retrieval-Augmented Generation) ํŒŒ์ดํ”„๋ผ์ธ์˜ ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„์ธ โ€œQuery Translation(์ฟผ๋ฆฌ ๋ณ€ํ™˜)โ€์— ๋Œ€ํ•ด ๋‹ค๋ฃจ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

  • Query Translation: ์‚ฌ์šฉ์ž๊ฐ€ ์ž‘์„ฑํ•œ ์งˆ๋ฌธ์ด ๋ชจํ˜ธํ•˜๊ฑฐ๋‚˜ ์ œ๋Œ€๋กœ ๊ตฌ์กฐํ™”๋˜์ง€ ์•Š์€ ๊ฒฝ์šฐ, ๋ฌธ์„œ์—์„œ ์˜๋ฏธ์  ์œ ์‚ฌ์„ฑ์„ ๊ธฐ์ค€์œผ๋กœ ๊ฒ€์ƒ‰ํ•˜๋Š” ๊ณผ์ •์—์„œ ์›ํ•˜๋Š” ๋ฌธ์„œ๋ฅผ ์ฐพ์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์งˆ๋ฌธ์„ ๋‹ค์–‘ํ•œ ๊ด€์ ์—์„œ ์žฌ์ž‘์„ฑํ•˜์—ฌ ๋ณด๋‹ค ํšจ๊ณผ์ ์œผ๋กœ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
    • ์•„๋ž˜ 3๊ฐ€์ง€ ๊ธฐ๋ฒ•์€ ๋Œ€ํ‘œ์ ์ธ Query Translation์˜ 3๊ฐ€์ง€ ๊ธฐ๋ฒ•์œผ๋กœ ์ด๋“ค์€ ๊ฐ๊ฐ ๋‹ค๋ฅด๊ฒŒ ์งˆ๋ฌธ์„ ๋ณ€ํ˜•ํ•˜์—ฌ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๊ธฐ๋ฒ•์œผ๋กœ, ๊ธฐ๋ณธ์ ์œผ๋กœ ์›๋ž˜ ์งˆ๋ฌธ์„ ์žฌ๊ตฌ์„ฑํ•˜๊ฑฐ๋‚˜ ๋ณ€ํ˜•ํ•˜๋Š” ๋ฐฉ์‹์ด๋ผ๋Š” ๊ณตํ†ต์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
      1. Query Rewriting(์ฟผ๋ฆฌ ์žฌ์ž‘์„ฑ): ์งˆ๋ฌธ์„ ๋‹ค์–‘ํ•œ ๊ด€์ ์—์„œ ๋‹ค์‹œ ์ž‘์„ฑํ•˜์—ฌ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋‹ค์ค‘ ์ฟผ๋ฆฌ(multi-query) ๊ธฐ๋ฒ•์€ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋ฐฉ์‹์œผ๋กœ ์งˆ๋ฌธ์„ ๋ณ€ํ™˜ํ•˜์—ฌ ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ์˜ ๋‹ค์–‘์„ฑ๊ณผ ์ •ํ™•์„ฑ์„ ๋†’์ด๋ ค๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
      2. Sub-questions(ํ•˜์œ„ ์งˆ๋ฌธ ์ƒ์„ฑ): ๋ณต์žกํ•˜๊ฑฐ๋‚˜ ์ถ”์ƒ์ ์ธ ์งˆ๋ฌธ์„ ๋” ์ž‘๊ณ  ๊ตฌ์ฒด์ ์ธ ํ•˜์œ„ ์งˆ๋ฌธ์œผ๋กœ ๋ถ„ํ•ดํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋” ์ •ํ™•ํ•˜๊ณ  ์„ธ๋ถ€์ ์ธ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Google์˜ โ€œleast-to-mostโ€ ๊ธฐ๋ฒ•์€ ๋ณต์žกํ•œ ์งˆ๋ฌธ์„ ๋” ์ž‘์€ ๋‹จ๊ณ„๋กœ ๋‚˜๋ˆ„์–ด ํ•ด๊ฒฐํ•˜๋Š” ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.
      3. Abstract Query(์ถ”์ƒ์ ์ธ ์งˆ๋ฌธ ์ƒ์„ฑ): ์งˆ๋ฌธ์„ ๋” ๋†’์€ ์ˆ˜์ค€์œผ๋กœ ์ถ”์ƒํ™”ํ•˜์—ฌ, ์ผ๋ฐ˜์ ์ด๊ฑฐ๋‚˜ ๊ด‘๋ฒ”์œ„ํ•œ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. โ€œStepback promptingโ€ ๊ธฐ๋ฒ•์€ ์งˆ๋ฌธ์„ ํ•œ ๋‹จ๊ณ„ ๋” ๋†’์€ ์ถ”์ƒํ™” ์ˆ˜์ค€์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๋ณด๋‹ค ๋„“์€ ๋ฒ”์œ„์˜ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ณธ ๊ฐ•์˜๋Š” โ€œ1. Query Rewriting(์ฟผ๋ฆฌ ์žฌ์ž‘์„ฑ)โ€ ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜์— ์†ํ•˜๋Š” Multi-query(๋‹ค์ค‘ ์ฟผ๋ฆฌ)์— ์ค‘์ ์„ ๋‘๊ณ  ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
    • ์—ฌ๊ธฐ์„œ ๋‹ค๋ฃจ๋Š” ํ•ต์‹ฌ ๊ฐœ๋…์€ ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์„ ์—ฌ๋Ÿฌ ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๋ฌธ์„œ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

  • Multi-Query ์ ‘๊ทผ ๋ฐฉ์‹

  • ๋‹ค์ค‘ ์ฟผ๋ฆฌ ๋ฐฉ์‹์˜ ๊ธฐ๋ณธ์ ์ธ ์ง๊ด€์€ ์งˆ๋ฌธ์„ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๊ด€์ ์—์„œ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • ์ด๋Š” ๋ฌธ์„œ์™€ ์งˆ๋ฌธ์ด ๊ณ ์ฐจ์› ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์—์„œ ์ž˜ ์ •๋ ฌ๋˜์ง€ ์•Š์„ ๊ฒฝ์šฐ, ์งˆ๋ฌธ์„ ์žฌ์ž‘์„ฑํ•˜์—ฌ ํ•ด๋‹น ๋ฌธ์„œ๋ฅผ ๋” ์ž˜ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ์ „๋žต์ž…๋‹ˆ๋‹ค.
  • ์ฆ‰, ์งˆ๋ฌธ์„ ์—ฌ๋Ÿฌ ๋ฐฉ์‹์œผ๋กœ ๋‹ค์‹œ ์ž‘์„ฑํ•จ์œผ๋กœ์จ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ฝ”๋“œ ์‹œ์—ฐ

  • ์ด ๋ฐฉ์‹์€ ์—ฌ๋Ÿฌ ์žฌ์ž‘์„ฑ๋œ ์งˆ๋ฌธ์„ ๋…๋ฆฝ์ ์œผ๋กœ ๊ฒ€์ƒ‰ํ•œ ํ›„, ๊ฐ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ฉํ•˜์—ฌ ๋” ์‹ ๋ขฐ์„ฑ ์žˆ๋Š” ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.
  1. ๋ธ”๋กœ๊ทธ ๋ฌธ์„œ ๋กœ๋“œ ๋ฐ ๋ฒกํ„ฐ ์Šคํ† ์–ด ์ƒ์„ฑ/๊ฒ€์ƒ‰ ์ค€๋น„

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    
    import bs4
    from langchain_community.document_loaders import WebBaseLoader
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain_openai import OpenAIEmbeddings
    from langchain_community.vectorstores import Chroma
       
    loader = WebBaseLoader(
        web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
        bs_kwargs=dict(
            parse_only=bs4.SoupStrainer(
                class_=("post-content", "post-title", "post-header")
            )
        ),
    )
       
    blog_docs = loader.load()
       
    text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=300,
        chunk_overlap=50
    )
       
    splits = text_splitter.split_documents(blog_docs)
       
    vectorstore = Chroma.from_documents(documents=splits,
                                        embedding=OpenAIEmbeddings())
       
    retriever = vectorstore.as_retriever()
    
    • ๋ธ”๋กœ๊ทธ ๋ฐ์ดํ„ฐ๋ฅผ ์›น์—์„œ ๊ฐ€์ ธ์™€ bs4๋ฅผ ์ด์šฉํ•ด ํŒŒ์‹ฑํ•œ ํ›„, ํ•ด๋‹น ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„ํ• ํ•˜์—ฌ ๋ฒกํ„ฐ ์Šคํ† ์–ด์— ์ธ๋ฑ์‹ฑํ•ฉ๋‹ˆ๋‹ค.
    • ๋ถ„ํ• ๋œ ๋ฌธ์„œ๋ฅผ ๋ฒกํ„ฐ ์Šคํ† ์–ด์— ์ €์žฅํ•˜๊ณ , ๊ฒ€์ƒ‰์„ ์œ„ํ•œ ์„ค์ •์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  2. ๋‹ค์ค‘ ์ฟผ๋ฆฌ ์ƒ์„ฑ์„ ์œ„ํ•œ ํ”„๋กฌํ”„ํŠธ ์ •์˜:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from langchain.prompts import ChatPromptTemplate

template = """
You are an AI language model assistant. 
Your task is to generate five different versions of the given user question to retrieve relevant documents from a vector database. 
    
By generating multiple perspectives on the user question, your goal is to help the user overcome some of the limitations of the distance-based similarity search.

Provide these alternative questions separated by newlines. 
    
Original question: {question}
"""
    
prompt_perspectives = ChatPromptTemplate.from_template(template)
  • ์งˆ๋ฌธ์„ ๋‹ค์–‘ํ•œ ๋ฐฉ์‹์œผ๋กœ ๋‹ค์‹œ ์ž‘์„ฑํ•˜๋Š” ๋‹ค์ค‘ ์ฟผ๋ฆฌ ์ƒ์„ฑ์šฉ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
    • (ํ•ด์„) ๋‹น์‹ ์€ AI ์–ธ์–ด ๋ชจ๋ธ ์–ด์‹œ์Šคํ„ดํŠธ์ž…๋‹ˆ๋‹ค. ์ฃผ์–ด์ง„ ์‚ฌ์šฉ์ž ์งˆ๋ฌธ์˜ ๋‹ค์„ฏ ๊ฐ€์ง€ ๋ฒ„์ „์„ ์ƒ์„ฑํ•˜์—ฌ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ๊ด€๋ จ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋Š” ๊ฒƒ์ด ์ž‘์—…์˜ ๋ชฉํ‘œ์ž…๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž ์งˆ๋ฌธ์— ๋Œ€ํ•œ ์—ฌ๋Ÿฌ ๊ด€์ ์„ ์ƒ์„ฑํ•จ์œผ๋กœ์จ ์‚ฌ์šฉ์ž๊ฐ€ ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜ ์œ ์‚ฌ์„ฑ ๊ฒ€์ƒ‰์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•  ์ˆ˜ ์žˆ๋„๋ก ๋•๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ž…๋‹ˆ๋‹ค. ์ƒˆ ์ค„๋กœ ๊ตฌ๋ถ„ํ•˜์—ฌ ๋‹ค์Œ ๋Œ€์ฒด ์งˆ๋ฌธ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  1. ๋‹ค์ค‘ ์ฟผ๋ฆฌ๋ฅผ ์ด์šฉํ•œ ๊ฒ€์ƒ‰ ๋ฐ ๋ฌธ์„œ ํ†ตํ•ฉ:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
from langchain.load import dumps, loads

generate_queries = (
    prompt_perspectives
    | ChatOpenAI(temperature=0)
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    unique_docs = list(set(flattened_docs))
    return [loads(doc) for doc in unique_docs]

retrieval_chain = generate_queries | retriever.map() | get_unique_union

question = "What is task decomposition for LLM agents?"
docs = retrieval_chain.invoke({"question":question})

len(docs)

  • ์ƒ์„ฑ๋œ ์—ฌ๋Ÿฌ ์งˆ๋ฌธ์„ ์ด์šฉํ•ด ๋…๋ฆฝ์ ์ธ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ฉํ•˜์—ฌ ์ค‘๋ณต๋˜์ง€ ์•Š๋Š” ๋ฌธ์„œ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    1. generate_queries: ์ฃผ์–ด์ง„ ์งˆ๋ฌธ์— ๋Œ€ํ•ด ์—ฌ๋Ÿฌ ๊ด€์ ์˜ ์ฟผ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

      a. ChatOpenAI ๋ชจ๋ธ์„ ํ†ตํ•ด ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

      b. StrOutputParser๋กœ ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์„ ํŒŒ์‹ฑํ•ฉ๋‹ˆ๋‹ค.

      c. ๊ฒฐ๊ณผ๋ฅผ ๊ฐœํ–‰ ๋ฌธ์ž(\n)๋กœ ๋ถ„ํ• ํ•˜์—ฌ ์—ฌ๋Ÿฌ ์ฟผ๋ฆฌ๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

    2. retriever: ์ƒ์„ฑ๋œ ์ฟผ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•ฉ๋‹ˆ๋‹ค.

      a. retriever.map()์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ์ƒ์„ฑ๋œ ์ฟผ๋ฆฌ์— ๋Œ€ํ•ด ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•ฉ๋‹ˆ๋‹ค.

      1
      
             
      
    3. get_unique_union: ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๋“ค ์ค‘ ์ค‘๋ณต์„ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.

      a. get_unique_unionย ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒ€์ƒ‰๋œ ๋ชจ๋“  ๋ฌธ์„œ์—์„œ ์ค‘๋ณต์„ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.

      • unique_docs = list(set(flattened_docs)) : ์ง‘ํ•ฉ(set)์„ ์‚ฌ์šฉํ•ด ์ค‘๋ณต์„ ์ œ๊ฑฐํ•œ ํ›„, ๋‹ค์‹œ ์›๋ž˜ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

      b. dumps์™€ย loads๋ฅผ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ, Document ๊ฐ์ฒด์˜ ๋‚ด์šฉ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ค‘๋ณต์„ ์ œ๊ฑฐํ•˜๊ณ , ๋‹ค์‹œ ์›๋ž˜์˜ ๊ฐ์ฒด ํ˜•ํƒœ๋กœ ๋ณต์›ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

      • ์ด๋Š” ํŠนํžˆ ๋ณต์žกํ•œ ๊ฐ์ฒด ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„ Document ํด๋ž˜์Šค๋ฅผ ๋‹ค๋ฃฐ ๋•Œ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.
  1. ์ตœ์ข… RAG(์งˆ๋ฌธ + ๋ฌธ์„œ):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from operator import itemgetter
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough

template = """
Answer the following question based on this context:
{context}

Question: 
{question}
"""

prompt = ChatPromptTemplate.from_template(template)
llm = ChatOpenAI(temperature=0)

# retrieval_chain = generate_queries | retriever.map() | get_unique_union
final_rag_chain = (
    {"context": retrieval_chain,
     "question": itemgetter("question")}
    | prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"question":question})

  • ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต์„ ์ƒ์„ฑํ•˜๋Š” ์ตœ์ข… RAG ์ฒด์ธ์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
    1. retrieval_chain์ด ์ปจํ…์ŠคํŠธ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
    2. itemgetter("question")๊ฐ€ ์ž…๋ ฅ์—์„œ ์งˆ๋ฌธ์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
    3. ์ด ๋‘ ์š”์†Œ๊ฐ€ ํ”„๋กฌํ”„ํŠธ ํ…œํ”Œ๋ฆฟ์— ์‚ฝ์ž…๋ฉ๋‹ˆ๋‹ค.
    4. ์™„์„ฑ๋œ ํ”„๋กฌํ”„ํŠธ๊ฐ€ LLM์— ์ „๋‹ฌ๋˜์–ด ์‘๋‹ต์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ’ก itemgetter ํ•จ์ˆ˜์— ๋Œ€ํ•ด์„œ ๋” ์•Œ์•„๋ณด์ž

  1. itemgetter ํ•จ์ˆ˜:
    • operator ๋ชจ๋“ˆ์˜ itemgetter๋Š” ๋”•์…”๋„ˆ๋ฆฌ๋‚˜ ์‹œํ€€์Šค์—์„œ ํŠน์ • ํ‚ค๋‚˜ ์ธ๋ฑ์Šค์˜ ๊ฐ’์„ ์ถ”์ถœํ•˜๋Š” callable ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    • ์—ฌ๊ธฐ์„œ๋Š” itemgetter("question")์ด ์‚ฌ์šฉ๋˜์–ด, ์ž…๋ ฅ ๋”•์…”๋„ˆ๋ฆฌ์—์„œ โ€œquestionโ€ ํ‚ค์˜ ๊ฐ’์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  2. itemgetter์™€ .invoke์˜ ์ƒํ˜ธ์ž‘์šฉ:
    • final_rag_chain.invoke({"question": question})๊ฐ€ ํ˜ธ์ถœ๋  ๋•Œ, itemgetter("question")๋Š” ์ด ์ž…๋ ฅ ๋”•์…”๋„ˆ๋ฆฌ์—์„œ โ€œquestionโ€ ํ‚ค์˜ ๊ฐ’์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
    • ์ด ์ถ”์ถœ๋œ ๊ฐ’์€ ์ฒด์ธ์˜ โ€œquestionโ€ ๋ถ€๋ถ„์— ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.
  3. ์ฒด์ธ ๋‚ด์—์„œ์˜ ์—ญํ• :
    • {"context": retrieval_chain, "question": itemgetter("question")}์—์„œ โ€œquestionโ€ ํ‚ค์— itemgetter("question")์ด ํ• ๋‹น๋ฉ๋‹ˆ๋‹ค.
    • ์ด๋Š” ์ž…๋ ฅ ๋”•์…”๋„ˆ๋ฆฌ์˜ โ€œquestionโ€ ๊ฐ’์„ ๊ทธ๋Œ€๋กœ ์ „๋‹ฌํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.
  4. ์ „์ฒด ๊ณผ์ •:
    • .invoke({"question": question})๊ฐ€ ํ˜ธ์ถœ๋˜๋ฉด, ์ด ๋”•์…”๋„ˆ๋ฆฌ๊ฐ€ ์ฒด์ธ์— ์ž…๋ ฅ๋ฉ๋‹ˆ๋‹ค.
    • itemgetter("question")๋Š” ์ด ๋”•์…”๋„ˆ๋ฆฌ์—์„œ โ€œquestionโ€ ๊ฐ’์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
    • ์ถ”์ถœ๋œ ๊ฐ’์€ ํ”„๋กฌํ”„ํŠธ ํ…œํ”Œ๋ฆฟ์˜ {question} ๋ถ€๋ถ„์„ ์ฑ„์šฐ๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

Part 6 (RAG Fusion)

  • ์ด ๊ฐ•์˜๋Š” RAG(Retrieval-Augmented Generation) ํŒŒ์ดํ”„๋ผ์ธ์˜ โ€œQuery Translation(์ฟผ๋ฆฌ ๋ณ€ํ™˜)โ€ ๋‘ ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์ธ RAG Fusion์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

  • RAG Fusion์€ ๋‹ค์ค‘ ์ฟผ๋ฆฌ์™€ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ฉํ•˜๋Š” ๊ณผ์ •์—์„œ ์ค‘์š”ํ•œ ๊ธฐ๋ฒ•์œผ๋กœ, Reciprocal Rank Fusion(RRF)์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๋ฅผ ์žฌ์ •๋ ฌํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.
    • ์ด์ „์— ์ฑ•ํ„ฐ 5์—์„œ ์„ค๋ช…ํ•œ ๋‹ค์ค‘ ์ฟผ๋ฆฌ ๋ฐฉ๋ฒ•์—์„œ๋Š” ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์„ ์—ฌ๋Ÿฌ ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•œ ํ›„, ๊ฐ ๋ณ€ํ™˜๋œ ์งˆ๋ฌธ์œผ๋กœ ๋…๋ฆฝ์ ์ธ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.
    • RAG Fusion๋„ ๊ธฐ๋ณธ์ ์œผ๋กœ ๊ฐ™์€ ๊ตฌ์กฐ๋ฅผ ๋”ฐ๋ฅด์ง€๋งŒ, Reciprocal Rank Fusion(RRF)์ด๋ผ๋Š” ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ๋ณด๋‹ค ํšจ์œจ์ ์œผ๋กœ ๊ฒฐํ•ฉํ•˜๊ณ  ์žฌ์ •๋ ฌํ•ฉ๋‹ˆ๋‹ค.
    • RRF์˜ ํ•ต์‹ฌ ๊ฐœ๋…์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋…๋ฆฝ์ ์ธ ๋ฌธ์„œ ๋ฆฌ์ŠคํŠธ์—์„œ ๊ฐ ๋ฌธ์„œ์˜ ๋žญํ‚น์„ ๊ณ„์‚ฐํ•˜๊ณ , ์ƒํ˜ธ ์ˆœ์œ„๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฌธ์„œ๋“ค์˜ ์ตœ์ข… ์ˆœ์œ„๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

RAG Fusion์˜ ์ง๊ด€

RAG Fusion์˜ ๊ธฐ๋ณธ ์›๋ฆฌ๋Š” ๋‹ค์ค‘ ์ฟผ๋ฆฌ ๋ฐฉ๋ฒ•๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ ์—ฌ๋Ÿฌ ๋ฒˆ์˜ ๊ฒ€์ƒ‰์„ ํ†ตํ•ด ๊ฐ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋ฌธ์„œ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•œ ํ›„, RRF ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ๊ฐ ๋ฌธ์„œ์˜ ์ˆœ์œ„๋ฅผ ํ•ฉ์‚ฐํ•˜์—ฌ ์ตœ์ข… ์ˆœ์œ„๊ฐ€ ๋†’์€ ๋ฌธ์„œ๋ฅผ ์„ ํƒํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๊ฐ๊ฐ์˜ ๋…๋ฆฝ์ ์ธ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ์ตœ์ ํ™”ํ•˜์—ฌ ์ตœ์ข… ๋ฌธ์„œ๋ฅผ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  1. ๋‹ค์ค‘ ์ฟผ๋ฆฌ ์ƒ์„ฑ: ํ•˜๋‚˜์˜ ์งˆ๋ฌธ์— ๋Œ€ํ•ด ์—ฌ๋Ÿฌ ๋ณ€ํ˜•๋œ ์ฟผ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, โ€œ์ธ๊ณต์ง€๋Šฅ์˜ ๋ฐœ์ „โ€์ด๋ผ๋Š” ์งˆ๋ฌธ์— ๋Œ€ํ•ด โ€œAI์˜ ๋ฏธ๋ž˜โ€, โ€œ๊ธฐ๊ณ„ ํ•™์Šต์˜ ์—ญ์‚ฌโ€ ๋“ฑ ๋‹ค์–‘ํ•œ ์ฟผ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  2. ๊ฐœ๋ณ„ ๊ฒ€์ƒ‰ ์ˆ˜ํ–‰: ๊ฐ ๋ณ€ํ˜•๋œ ์ฟผ๋ฆฌ๋กœ ๋…๋ฆฝ์ ์ธ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ํ•˜๋‚˜์˜ ์ฟผ๋ฆฌ๊ฐ€ ์•„๋‹ˆ๋ผ ์—ฌ๋Ÿฌ ์ฟผ๋ฆฌ๋กœ ๊ฒ€์ƒ‰์„ ์ง„ํ–‰ํ•˜์—ฌ ๋” ๋‹ค์–‘ํ•œ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.
  3. RRF ์ ์šฉ: ์ด๋ ‡๊ฒŒ ์–ป์€ ์—ฌ๋Ÿฌ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ RRF ๋ฐฉ์‹์œผ๋กœ ํ†ตํ•ฉํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ๋ฌธ์„œ์˜ ์ˆœ์œ„๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ ์ˆ˜๋ฅผ ๋งค๊ธฐ๊ณ , ์ƒ์œ„ ๋ฌธ์„œ์— ๋” ๋†’์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜๋ฉด์„œ ์ตœ์ข… ์ˆœ์œ„๋ฅผ ์žฌ์ •๋ ฌํ•ฉ๋‹ˆ๋‹ค.
  4. ๊ฒฐ๊ณผ ํ†ตํ•ฉ: ์žฌ์ •๋ ฌ๋œ ๋ฌธ์„œ๋“ค์„ ์ตœ์ข… ์ปจํ…์ŠคํŠธ๋กœ ์‚ฌ์šฉํ•˜์—ฌ LLM์— ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค. LLM์€ ์ด ๋ฌธ์„œ๋“ค์„ ๋ฐ”ํƒ•์œผ๋กœ ์ตœ์ข… ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

Reciprocal Rank Fusion (RRF)

  1. ์ˆœ์œ„ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ (Ranking-based Approach)

RRF๋Š” ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ โ€œ์ ์ˆ˜โ€๊ฐ€ ์•„๋‹Œ โ€œ์ˆœ์œ„โ€์— ๊ธฐ๋ฐ˜ํ•ด ํ†ตํ•ฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ์—์„œ ๊ฐ ๋ฌธ์„œ๊ฐ€ ๋ช‡ ๋ฒˆ์งธ๋กœ ์ค‘์š”ํ•œ์ง€(์ˆœ์œ„)๋ฅผ ๋งค๊ธฐ๊ณ , ์ด๋ฅผ ํ™œ์šฉํ•ด ์ตœ์ข…์ ์œผ๋กœ ์–ด๋–ค ๋ฌธ์„œ๊ฐ€ ๋” ์ค‘์š”ํ•œ์ง€๋ฅผ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค.

  • ์™œ ์ˆœ์œ„๊ฐ€ ์ค‘์š”ํ•œ๊ฐ€?
    • ์ผ๋ฐ˜์ ์œผ๋กœ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋Š” ์ ์ˆ˜(ํ•ด๋‹น ๋ฌธ์„œ๊ฐ€ ์–ผ๋งˆ๋‚˜ ๊ด€๋ จ์„ฑ์ด ๋†’์€์ง€ ํ‰๊ฐ€ํ•œ ๊ฐ’)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‚˜์—ด๋˜๋Š”๋ฐ, ๋ฌธ์ œ๋Š” ๊ฒ€์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋งˆ๋‹ค ์ด ์ ์ˆ˜ ์Šค์ผ€์ผ์ด ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
    • ์˜ˆ๋ฅผ ๋“ค์–ด, ํ•˜๋‚˜์˜ ๊ฒ€์ƒ‰ ์—”์ง„์€ 0์—์„œ 1๊นŒ์ง€ ์ ์ˆ˜๋ฅผ ๋งค๊ธฐ๊ณ , ๋‹ค๋ฅธ ์—”์ง„์€ 0์—์„œ 100๊นŒ์ง€ ์ ์ˆ˜๋ฅผ ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์Šค์ผ€์ผ์ด ๋‹ค๋ฅด๋ฉด ๋‹จ์ˆœํžˆ ์ ์ˆ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ฉํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.
    • ๋ฐ˜๋ฉด, ์ˆœ์œ„๋Š” ๊ฐ ๊ฒ€์ƒ‰ ์—”์ง„์—์„œ ๋งค๊ฒจ์ง„ ์ˆœ์„œ ๊ทธ๋Œ€๋กœ ํ™œ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์Šค์ผ€์ผ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  1. RRF ์ ์ˆ˜ ๊ณ„์‚ฐ ๋ฐฉ์‹
  • RRF์—์„œ ๊ฐ ๋ฌธ์„œ์˜ ์ ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค:

    score=โˆ‘1k+rank(d)score = \sum \frac{1}{k + rank(d)}score=โˆ‘k+rank(d)1โ€‹

  • ์—ฌ๊ธฐ์„œ k๋Š” ์ž‘์€ ์ƒ์ˆ˜(์ผ๋ฐ˜์ ์œผ๋กœ 60 ์ •๋„), rank(d)๋Š” ๋ฌธ์„œ d๊ฐ€ ๊ฐ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ์—์„œ ์ฐจ์ง€ํ•œ ์ˆœ์œ„์ž…๋‹ˆ๋‹ค. ์ˆœ์œ„๊ฐ€ ๋†’์„์ˆ˜๋ก, ์ฆ‰ ๋” ์ƒ์œ„์— ๋žญํฌ๋œ ๋ฌธ์„œ์ผ์ˆ˜๋ก ์ ์ˆ˜๊ฐ€ ๋†’์•„์ง‘๋‹ˆ๋‹ค.
  • ์ด ๊ณต์‹์ด ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€ ์˜ˆ๋ฅผ ๋“ค์–ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:

    • ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ์—์„œ ์–ด๋–ค ๋ฌธ์„œ๊ฐ€ ์ฒซ ๋ฒˆ์งธ ์ˆœ์œ„์— ์žˆ์œผ๋ฉด, ์ ์ˆ˜๋Š” 160+1\frac{1}{60 + 1}60+11โ€‹ , ์ฆ‰ ์•ฝ 0.016์ž…๋‹ˆ๋‹ค.
    • ๋‘ ๋ฒˆ์งธ ์ˆœ์œ„๋ผ๋ฉด, ์ ์ˆ˜๋Š”160+2\frac{1}{60 + 2}60+21โ€‹ , ์ฆ‰ ์•ฝ 0.0157์ด ๋ฉ๋‹ˆ๋‹ค.
    • ์ด์ฒ˜๋Ÿผ ์ˆœ์œ„๊ฐ€ ๋‚ฎ์•„์งˆ์ˆ˜๋ก ์ ์ˆ˜๊ฐ€ ์ค„์–ด๋“œ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ƒ์œ„์— ๋žญํฌ๋œ ๋ฌธ์„œ๊ฐ€ ๋” ํฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ฝ”๋“œ ์‹œ์—ฐ

1. RAG Fusion์šฉ ํ”„๋กฌํ”„ํŠธ ์ •์˜ ๋ฐ ๋‹ค์ค‘ ์ฟผ๋ฆฌ ์ƒ์„ฑ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from langchain.prompts import ChatPromptTemplate

template = """
You are a helpful assistant that generates multiple search queries based on a single input query. \n
Generate multiple search queries related to: {question} \n

Output (4 queries):
"""

prompt_rag_fusion = ChatPromptTemplate.from_template(template)

from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

generate_queries = (
    prompt_rag_fusion
    | ChatOpenAI(temperature=0)
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

  1. ๋‹ค์ค‘ ์ฟผ๋ฆฌ ์ƒ์„ฑ:
    • ChatPromptTemplate์„ ์‚ฌ์šฉํ•˜์—ฌ ์ž…๋ ฅ ์งˆ๋ฌธ์„ ๋ฐ”ํƒ•์œผ๋กœ ์—ฌ๋Ÿฌ ๊ฒ€์ƒ‰ ์ฟผ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•˜๋„๋ก ํ”„๋กฌํ”„ํŠธ๋ฅผ ์„ค๊ณ„ํ–ˆ์Šต๋‹ˆ๋‹ค.
  2. ์ฒด์ธ ๊ตฌ์„ฑ:
    • prompt_rag_fusion ChatOpenAI StrOutputParser() ์ˆœ์œผ๋กœ ์ฒด์ธ์„ ๊ตฌ์„ฑํ•˜์—ฌ ํ”„๋กฌํ”„ํŠธ ์‹คํ–‰, LLM ์‘๋‹ต ์ƒ์„ฑ, ๋ฌธ์ž์—ด ํŒŒ์‹ฑ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  3. ๋ฆฌ์ŠคํŠธ๋กœ ๋ณ€ํ™˜:
    • (lambda x: x.split("\n"))ย ๋ถ€๋ถ„์ด LLM์˜ ์ถœ๋ ฅ์„ ์ค„๋ฐ”๊ฟˆ์„ ๊ธฐ์ค€์œผ๋กœ ๋ถ„๋ฆฌํ•˜์—ฌ ๋ฆฌ์ŠคํŠธ๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

2. Reciprocal Rank Fusion (RRF) ํ•จ์ˆ˜ ์ •์˜ ๋ฐ ๊ฒ€์ƒ‰ ์ˆ˜ํ–‰

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
from langchain.load import dumps, loads

def reciprocal_rank_fusion(results: list[list], k=60):
    """ Reciprocal_rank_fusion that takes multiple lists of ranked documents 
        and an optional parameter k used in the RRF formula """
    
    # Initialize a dictionary to hold fused scores for each unique document
    fused_scores = {}

    # Iterate through each list of ranked documents
    for docs in results:
        # Iterate through each document in the list, with its rank (position in the list)
        for rank, doc in enumerate(docs):
            # Convert the document to a string format to use as a key (assumes documents can be serialized to JSON)
            doc_str = dumps(doc)
            # If the document is not yet in the fused_scores dictionary, add it with an initial score of 0
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            # Retrieve the current score of the document, if any
            previous_score = fused_scores[doc_str]
            # Update the score of the document using the RRF formula: 1 / (rank + k)
            fused_scores[doc_str] = previous_score + 1 / (rank + k)

    # Sort the documents based on their fused scores in descending order to get the final reranked results
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # Return the reranked results as a list of tuples, each containing the document and its fused score
    return reranked_results

retrieval_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion
docs = retrieval_chain_rag_fusion.invoke({"question": question})
len(docs)
  • RRF ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ตฌํ˜„ํ•˜์—ฌ, ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋…๋ฆฝ์ ์ธ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ ๋ฆฌ์ŠคํŠธ์—์„œ ๋ฌธ์„œ๋ฅผ ํ†ตํ•ฉํ•˜๊ณ  ์ˆœ์œ„๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
    • ์ด์ค‘ for ๋ฌธ์„ ํ†ตํ•ด ๊ฐ๊ฐ result์™€ docs๋ฅผ ๋Œ๋ฉด์„œ ๋“ฑ์žฅํ•˜๋Š” ๋ฌธ์„œ์˜ ์ˆœ์„œ์— ๋”ฐ๋ผ ์ ์ˆ˜๋ฅผ ๋งค๊น๋‹ˆ๋‹ค.
    • (์ฐธ๊ณ ) ์ด์ „์— ๊ฐœ๋…์—์„œ ์‚ดํŽด๋ดฃ๋˜ ๊ฒƒ์ฒ˜๋Ÿผ RRF์˜ score๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:
      • score=โˆ‘1k+rank(d)score = \sum \frac{1}{k + rank(d)}score=โˆ‘k+rank(d)1โ€‹
    • ์žฌ ๋“ฑ์žฅํ•œ ๋ฌธ์„œ์— ๋Œ€ํ•ด์„œ๋Š” ์ด์ „ score(previous score)์— ํ˜„์žฌ ์Šค์ฝ”์–ด(current score)๋ฅผ ๋”ํ•ด์ฃผ๋Š” ํ˜•ํƒœ๋กœ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค
  • ์ƒ์„ฑ๋œ ์ฟผ๋ฆฌ๋ฅผ ํ†ตํ•ด ๋…๋ฆฝ์ ์ธ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ๊ทธ ๊ฒฐ๊ณผ๋ฅผ RRF ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ์žฌ์ •๋ ฌํ•ฉ๋‹ˆ๋‹ค.

3. ์ตœ์ข… RAG ์ฒด์ธ ์ •์˜

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from langchain_core.runnables import RunnablePassthrough

template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    {"context": retrieval_chain_rag_fusion,
     "question": itemgetter("question")}
    | prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"question":question})

  • ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต์„ ์ƒ์„ฑํ•˜๋Š” ์ตœ์ข… RAG ์ฒด์ธ์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

์š”์•ฝ

  • RAG Fusion์€ ๋‹ค์ค‘ ์ฟผ๋ฆฌ๋ฅผ ์ด์šฉํ•ด ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๋“ค์„ Reciprocal Rank Fusion(RRF) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ์žฌ์ •๋ ฌํ•˜๊ณ , ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ตœ์ข… ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
  • RRF๋Š” ๋‹ค์–‘ํ•œ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ํ•˜๋‚˜๋กœ ํ†ตํ•ฉํ•˜์—ฌ, ๊ฐ ๋ฌธ์„œ์˜ ์ค‘์š”๋„๋ฅผ ์ˆœ์œ„๋กœ ํ™˜์‚ฐํ•˜๊ณ  ์ตœ์ ์˜ ๋ฌธ์„œ๋ฅผ ์„ ํƒํ•˜๋Š” ๋ฐ ์œ ๋ฆฌํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.
  • ์ด ๋ฐฉ๋ฒ•์€ ํŠนํžˆ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ฒกํ„ฐ ์Šคํ† ์–ด์—์„œ ๋™์‹œ์— ๊ฒ€์ƒ‰ํ•˜๊ฑฐ๋‚˜, ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ ์งˆ๋ฌธ์— ๋Œ€ํ•ด ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•  ๋•Œ ์œ ์šฉํ•˜๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ค‘๊ฐ„ ์ •๋ฆฌ

  • ์ง€๊ธˆ๊นŒ์ง€ ๋ฉ€ํ‹ฐ์ฟผ๋ฆฌ ๊ธฐ๋ฒ• ๊ณผ RAG Fusion์— ๋Œ€ํ•ด์„œ ์‚ดํŽด๋ดค๋Š”๋ฐ์š”. ํ•œ๋ฒˆ ์ •๋ฆฌํ•˜๊ณ  ๋„˜์–ด๊ฐ€๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ๋‘ ๊ธฐ๋ฒ• ๋ชจ๋‘ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐ ํšจ๊ณผ์ ์ด์ง€๋งŒ, RAG Fusion์€ ๊ฒฐ๊ณผ์˜ ํ’ˆ์งˆ ๊ฐœ์„ ์— ๋” ์ค‘์ ์„ ๋‘๋Š” ๋ฐ˜๋ฉด, ๋ฉ€ํ‹ฐ์ฟผ๋ฆฌ ๊ธฐ๋ฒ•์€ ๊ฒ€์ƒ‰ ๋ฒ”์œ„ ํ™•๋Œ€์— ๋” ์ดˆ์ ์„ ๋งž์ถฅ๋‹ˆ๋‹ค.
  • ์ƒํ™ฉ์— ๋”ฐ๋ผ ์ ์ ˆํ•œ ๊ธฐ๋ฒ•์„ ์„ ํƒํ•˜๊ฑฐ๋‚˜ ๋‘ ๊ธฐ๋ฒ•์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

RAG Fusion

  • ์ฟผ๋ฆฌ ์ƒ์„ฑ: ์›๋ณธ ์ฟผ๋ฆฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ LLM์„ ์‚ฌ์šฉํ•ด ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ด€๋ จ ์ฟผ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฒ€์ƒ‰ ๊ณผ์ •: ์ƒ์„ฑ๋œ ์—ฌ๋Ÿฌ ์ฟผ๋ฆฌ๋กœ ๊ฐ๊ฐ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ ํ†ตํ•ฉ: Reciprocal Rank Fusion (RRF) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ฉํ•˜๊ณ  ์žฌ์ •๋ ฌํ•ฉ๋‹ˆ๋‹ค.
  • ์žฅ์ : ๋‹ค์–‘ํ•œ ๊ด€์ ์˜ ์ฟผ๋ฆฌ๋ฅผ ํ†ตํ•ด ๋” ํฌ๊ด„์ ์ธ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, RRF๋ฅผ ํ†ตํ•ด ๊ฒฐ๊ณผ์˜ ํ’ˆ์งˆ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

๋ฉ€ํ‹ฐ์ฟผ๋ฆฌ ๊ธฐ๋ฒ• (Multi-Query Retriever)

  • ์ฟผ๋ฆฌ ์ƒ์„ฑ: ์›๋ณธ ์ฟผ๋ฆฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ LLM์„ ์‚ฌ์šฉํ•ด ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ด€๋ จ ์ฟผ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฒ€์ƒ‰ ๊ณผ์ •: ์ƒ์„ฑ๋œ ์—ฌ๋Ÿฌ ์ฟผ๋ฆฌ๋กœ ๊ฐ๊ฐ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ ํ†ตํ•ฉ: ๊ฒ€์ƒ‰๋œ ๋ชจ๋“  ๋ฌธ์„œ๋ฅผ ๋‹จ์ˆœํžˆ ํ•ฉ์น˜๊ฑฐ๋‚˜, ์ค‘๋ณต์„ ์ œ๊ฑฐํ•˜์—ฌ ํ†ตํ•ฉํ•ฉ๋‹ˆ๋‹ค.
  • ์žฅ์ : ๋‹จ์ผ ์ฟผ๋ฆฌ๋ณด๋‹ค ๋” ๋‹ค์–‘ํ•œ ๊ด€๋ จ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ์–ด recall์ด ํ–ฅ์ƒ๋ฉ๋‹ˆ๋‹ค.

Part 7 (๋ถ„ํ•ด)

  • ์ด ๊ฐ•์˜๋Š” RAG(Retrieval-Augmented Generation) ํŒŒ์ดํ”„๋ผ์ธ์˜ โ€œQuery Translation(์ฟผ๋ฆฌ ๋ณ€ํ™˜)โ€ ์ค‘ ์„ธ ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์ธ Decomposition(์งˆ๋ฌธ ๋ถ„ํ•ด)์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

๋ฌธ์ œ ์ •์˜ ๋ฐ ์ ‘๊ทผ ๋ฐฉ๋ฒ•

  • Decomposition(์งˆ๋ฌธ ๋ถ„ํ•ด)๋Š” ๋ณต์žกํ•œ ์งˆ๋ฌธ์„ ์—ฌ๋Ÿฌ ํ•˜์œ„ ์งˆ๋ฌธ์œผ๋กœ ๋‚˜๋ˆ„์–ด ๊ฐ๊ฐ์„ ๋…๋ฆฝ์ ์œผ๋กœ ํ•ด๊ฒฐํ•œ ํ›„, ์ตœ์ข…์ ์œผ๋กœ ํ†ตํ•ฉํ•˜์—ฌ ๋‹ต๋ณ€์„ ์ œ๊ณตํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.
  • ์ด์ „ ๋ฐฉ๋ฒ•๋“ค์ธ ๋‹ค์ค‘ ์ฟผ๋ฆฌ(Multi-query)์™€ RAG Fusion์—์„œ๋Š” ์งˆ๋ฌธ์„ ์—ฌ๋Ÿฌ ๋ฐฉ์‹์œผ๋กœ ๋ณ€ํ™˜(rewrite-question)ํ•˜์—ฌ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๋ ค ํ–ˆ์Šต๋‹ˆ๋‹ค.

  • Decomposition ๋ฐฉ์‹, ๋‹ค๋ฅธ ์ด๋ฆ„์œผ๋กœ๋Š” sub-question๋ฐฉ์‹์€ ๊ธฐ์กด ์งˆ๋ฌธ์„ ํ•˜์œ„ ๋ฌธ์ œ๋กœ ๋ถ„ํ•ดํ•˜์—ฌ ๊ฐ ๋ฌธ์ œ๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๋Š” ์ ‘๊ทผ์ž…๋‹ˆ๋‹ค.
    • ์ด ๋ฐฉ๋ฒ•์€ ์ฃผ๋กœ ๋ณต์žกํ•œ ๋ฌธ์ œ๋‚˜ ์งˆ๋ฌธ์„ ํ•ด๊ฒฐํ•  ๋•Œ ์œ ์šฉํ•˜๋ฉฐ, ๊ฐ๊ฐ์˜ ํ•˜์œ„ ์งˆ๋ฌธ์„ ๋…๋ฆฝ์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๋ฉด์„œ ์ด์ „ ์งˆ๋ฌธ์˜ ๋‹ต์„ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ค์Œ ์งˆ๋ฌธ์„ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.
    • ์ฃผ์š” ์—ฐ๊ตฌ ๋ฐ ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ๋Š” โ€œLeast-to-Mostโ€์™€ โ€œIT-CoTโ€๊ธฐ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๋…ผ๋ฌธ: Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
    • ์ตœ์†Œ-์ตœ๋Œ€ ํ”„๋กฌํ”„ํŒ…(Least-to-Most Prompting)์€ ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ฌธ์ œ๋ฅผ ๋” ์ž‘์€ ํ•˜์œ„ ๋ฌธ์ œ๋กœ ๋‚˜๋ˆˆ ํ›„, ๊ฐ ํ•˜์œ„ ๋ฌธ์ œ๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
    • ์ด์ „ ํ•˜์œ„ ๋ฌธ์ œ์˜ ๋‹ต์„ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ค์Œ ํ•˜์œ„ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ, ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ์ด ์–ด๋ ค์šด ๋ฌธ์ œ๋ฅผ ๋” ์‰ฝ๊ฒŒ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
    • Chain-of-Thought(์—ฐ์‡„์  ์‚ฌ๊ณ  ํ”„๋กฌํ”„ํŒ…)์™€ ๊ฐ™์€ ๊ธฐ์กด ๊ธฐ๋ฒ•์€ ๋” ์–ด๋ ค์šด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ ํ•œ๊ณ„๊ฐ€ ์žˆ์—ˆ์œผ๋‚˜, ์ตœ์†Œ-์ตœ๋Œ€ ํ”„๋กฌํ”„ํŒ…์€ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
  • ๋…ผ๋ฌธ: Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
    • IRCoT๋Š” ๋‹ค๋‹จ๊ณ„ ์งˆ๋ฌธ์— ๋‹ตํ•  ๋•Œ, ์ •๋ณด ๊ฒ€์ƒ‰๊ณผ Chain-of-Thought(COT) ์ถ”๋ก ์„ ์ƒํ˜ธ ๋ณด์™„์ ์œผ๋กœ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
    • ๋ชจ๋ธ์ด ๋‹ต์„ ๋„์ถœํ•˜๋Š” ์ค‘๊ฐ„ ๋‹จ๊ณ„์—์„œ ํ•„์š”ํ•œ ์ •๋ณด๋ฅผ ์ง€์†์ ์œผ๋กœ ๊ฒ€์ƒ‰ํ•ด ์˜ค๊ณ , ๊ฒ€์ƒ‰๋œ ์ •๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ƒˆ๋กœ์šด CoT ๋‹จ๊ณ„๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์ด๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.
    • ๋‹จ๊ณ„๋ณ„๋กœ ํ•„์š”ํ•œ ์ •๋ณด๋ฅผ ๊ฒ€์ƒ‰ํ•˜๊ณ , ์ถ”๋ก  ๊ณผ์ •์„ ํ†ตํ•ด ์–ป์€ ์ •๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ถ”๊ฐ€ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๋ณต์žกํ•œ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€์„ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋Ÿฌํ•œ ๊ฐœ๋…๋“ค์„ ์ข…ํ•ฉํ•ด์„œ ์•„๋ž˜์™€ ๊ฐ™์ decomposition ์ปจ์…‰์„ ๊ทธ๋ ค๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

Decomposition ๋ฐฉ์‹์˜ ์ง๊ด€

  • Decomposition ๋ฐฉ์‹์—์„œ๋Š” ์งˆ๋ฌธ์„ ์ž‘๊ฒŒ ๋‚˜๋ˆ„์–ด ๋” ์‰ฝ๊ฒŒ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ํ•˜์œ„ ๋ฌธ์ œ๋กœ ๋ถ„ํ•ดํ•˜๊ณ , ๊ฐ๊ฐ์˜ ๋ฌธ์ œ์— ๋Œ€ํ•ด ๋…๋ฆฝ์ ์œผ๋กœ ๊ฒ€์ƒ‰ ๋ฐ ๋‹ต๋ณ€์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด ๊ณผ์ •์—์„œ ์ด์ „ ์งˆ๋ฌธ์˜ ๋‹ต๋ณ€์„ ๋‹ค์Œ ์งˆ๋ฌธ์— ์‚ฌ์šฉํ•˜์—ฌ ์ ์ง„์ ์œผ๋กœ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด, ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ฝ”๋“œ ์‹œ์—ฐ

1. Decomposition์šฉ ํ”„๋กฌํ”„ํŠธ ์ •์˜ / LLM์„ ์ด์šฉํ•œ ํ•˜์œ„ ์งˆ๋ฌธ ์ƒ์„ฑ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

# prompt template for decomposition
template = """
You are a helpful assistant that generates multiple sub-questions related to an input question. \n

The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n

Generate multiple search queries related to: {question} \n

Output (3 queries):
"""

prompt_decomposition = ChatPromptTemplate.from_template(template)

llm = ChatOpenAI(temperature=0)

generate_queries_decomposition = (
    prompt_decomposition 
    | llm 
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

# ์˜ˆ์‹œ ์งˆ๋ฌธ
question = "What are the main components of an LLM-powered autonomous agent system?"

questions = generate_queries_decomposition.invoke({"question":question})
  • ์ž…๋ ฅ๋œ ์งˆ๋ฌธ์„ ํ•˜์œ„ ์งˆ๋ฌธ์œผ๋กœ ๋ถ„ํ•ดํ•˜๊ธฐ ์œ„ํ•œ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
  • ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž…๋ ฅ๋œ ์งˆ๋ฌธ์„ ํ•˜์œ„ ์งˆ๋ฌธ์œผ๋กœ ๋ถ„ํ•ดํ•˜๊ณ , ์ด๋ฅผ ๋ฆฌ์ŠคํŠธ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

  • ์œ„ ์˜ˆ์‹œ๋ฅผ ๋ณด๋ฉด, generate_queries_decomposition๋ฅผ ํ†ตํ•ด์„œ โ€œLLM ๊ธฐ๋ฐ˜ ์ž์œจ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์˜ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ๋Š” ๋ฌด์—‡์ธ๊ฐ€์š”?โ€ ์ด๋ผ๋Š” ์งˆ๋ฌธ์ด ์•„๋ž˜์™€ ๊ฐ™์ด 3๊ฐ€์ง€ ์งˆ๋ฌธ์œผ๋กœ ๋ถ„ํ•ด๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

    • โ€˜1. ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ(LLM)์˜ ํ•ต์‹ฌ ์š”์†Œ๋Š” ๋ฌด์—‡์ธ๊ฐ€์š”?
    • โ€˜2. ์ž์œจ ์—์ด์ „ํŠธ๋Š” ์–ด๋–ป๊ฒŒ LLM์„ ์•„ํ‚คํ…์ฒ˜์— ํ†ตํ•ฉํ•˜๋‚˜์š”?โ€™
    • โ€˜3. LLM ๊ธฐ๋ฐ˜ ์ž์œจ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์˜ ์ฃผ์š” ๊ธฐ๋Šฅ์€ ๋ฌด์—‡์ธ๊ฐ€์š”?โ€™

2. ํ•˜์œ„ ์งˆ๋ฌธ๋ณ„๋กœ ๋‹ต๋ณ€ ์ƒ์„ฑ ๋ฐ ์—ฐ์†์ ์ธ ์ฒ˜๋ฆฌ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

# prompt template for RAG
template = """
Here is the question you need to answer:

\n --- \n {question} \n --- \n

Here is any available background question + answer pairs:

\n --- \n {q_a_pairs} \n --- \n

Here is additional context relevant to the question: 

\n --- \n {context} \n --- \n

Use the above context and any background question + answer pairs to answer the question: \n {question}
"""

decomposition_prompt = ChatPromptTemplate.from_template(template)

def format_qa_pair(question, answer):
    """
    Format question and answer pairs for inclusion in the prompt
    """
    formatted_string = ""
    formatted_string += f"Question: {question}\nAnswer: {answer}\n\n"
    return formatted_string.strip()

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# Initialize an empty string to accumulate question-answer pairs
q_a_pairs = ""

for q in questions:
    rag_chain = (
        {
            "context": itemgetter("question") | retriever, 
            "question": itemgetter("question"),
            "q_a_pairs": itemgetter("q_a_pairs")
        } 
        | decomposition_prompt
        | llm
        | StrOutputParser()
    )
    
    answer = rag_chain.invoke({"question": q, "q_a_pairs": q_a_pairs})
    q_a_pair = format_qa_pair(q, answer)
    q_a_pairs = q_a_pairs + "\n---\n" + q_a_pair
    

  • ์œ„์—์„œ ์ƒ์„ฑ๋œ 3๊ฐœ์˜ ํ•˜์œ„ ์งˆ๋ฌธ์— ๋Œ€ํ•ด ์ˆœ์ฐจ์ ์œผ๋กœ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ์ด์ „ ์งˆ๋ฌธ์˜ ๋‹ต๋ณ€์„ ๋‹ค์Œ ์งˆ๋ฌธ์— ํ™œ์šฉํ•˜์—ฌ ์ ์ง„์ ์œผ๋กœ ํ•ด๊ฒฐํ•ด ๋‚˜๊ฐ‘๋‹ˆ๋‹ค.

    • q_a_pair = format_qa_pair(q, answer) ๋ฅผ ํ†ตํ•ด ์ด์ „ ๋‹ต๋ณ€์„ ๊ฐ์‹ธ๊ณ , rag_chain.invoke({โ€œquestionโ€: q, โ€œq_a_pairsโ€: q_a_pairs}) ์‹œ์— ๋„ฃ์–ด์คŒ์œผ๋กœ์จ ์ด์ „ ๋‹ต๋ณ€์„ ํ˜„์žฌ ๋‹ต๋ณ€์„ ํ•ด์ค„๋•Œ ์ฐธ๊ณ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค.
  • (์˜ˆ์‹œ) 2๋ฒˆ์งธ ํ”„๋กฌํ”„ํŠธ Input

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    # 2๋ฒˆ ์งˆ๋ฌธ์˜ ๋‹ต๋ณ€์„ ์œ„ํ•ด q_a_pair๋กœ 1๋ฒˆ Question๊ณผ Answer๋ฅผ ์ฐธ๊ณ ํ•˜๊ณ  ์žˆ์Œ
    {
      "question": 
    "2. How do autonomous agents integrate LLMs into their architecture?",
      
      "q_a_pairs": 
    "\n---\nQuestion: 1. What are the core elements of a large language model (LLM)?\n
    Answer: The core elements of a large language model (LLM) include:\n\n1. **Architecture**: The foundational design of the LLM, typically involving layers of neural networks such as transformers. This architecture determines how the model processes and generates language.\n\n2. **Training Data**: The corpus of text data used to train the model. This data is crucial for the model to learn language patterns, grammar, facts, and even some reasoning capabilities.\n\n3. **Training Process**: The method by which the model learns from the training data, often involving techniques like supervised learning, unsupervised learning, or reinforcement learning. This process includes fine-tuning and adjusting the model's parameters to improve its performance.\n\n4. **Tokenization**: The process of breaking down text into smaller units (tokens) that the model can understand and process. Tokenization is essential for handling different languages, special characters, and various text structures.\n\n5. **Context Handling**: The mechanism by which the model understands and maintains the context of a conversation or text. This includes managing the finite context length and using techniques like attention mechanisms to focus on relevant parts of the input.\n\n6. **Memory**: Systems that allow the model to store and recall information beyond the immediate context window. This can involve techniques like vector stores and retrieval systems to access a larger knowledge pool.\n\n7. **Inference Mechanism**: The process by which the model generates responses based on the input it receives. This includes the model's ability to perform tasks like text generation, translation, summarization, and more.\n\n8. **Optimization and Planning**: For advanced applications, LLMs may include components for planning, breaking down tasks into subgoals, and refining actions based on self-reflection and feedback.\n\nThese elements work together to enable the LLM to perform a wide range of language-related tasks effectively."
    }
    
  • ์ตœ์ข… ๋‹ต๋ณ€: 1 โ†’ 2 โ†’ 3์— ๋Œ€ํ•ด์„œ ์ˆœ์ฐจ์ ์œผ๋กœ ๋‹ต๋ณ€ ํ•ด๊ฐ€๋ฉด์„œ ๊ณ ๋„ํ™”ํ•ด๊ฐ„ ๋‹ต๋ณ€์ž…๋‹ˆ๋‹ค.

    • ๋‚ด์šฉ์„ ๋ณด๋‹ˆ ์–ผ์ถ” ๋งž๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

      1
      2
      
      The essential technologies supporting an LLM-powered autonomous agent include:
          
      
      1
      
        1. Large Language Models (LLMs):
      
      • Natural Language Interface: LLMs serve as the core controller or โ€œbrainโ€ of the system, enabling the agent to understand, generate, and parse instructions and responses through natural language interactions. This interface facilitates communication between the LLM and external components such as memory systems, planning modules, and tools.
        1. Planning Technologies:
      • Task Decomposition: Techniques like Chain of Thought (CoT) and Tree of Thoughts (ToT) are used to break down complex tasks into smaller, manageable subgoals. This helps the agent plan and execute tasks step-by-step.
      • Reflection and Refinement: The agent performs self-criticism and self-reflection to learn from past actions, refine strategies, and improve the quality and efficiency of its outputs.
        1. Memory Systems:
      • Finite Context Length Handling: Due to the finite context length limitation of LLMs, mechanisms such as vector stores and retrieval are employed to access a larger knowledge pool and overcome context capacity constraints.
      • Retrieval Models: These models surface relevant context based on factors like recency, importance, and relevance to inform the agentโ€™s behavior and decision-making processes.
      • Reflection Mechanism: This involves synthesizing memories into higher-level inferences that guide future behavior, generating summaries of past events for better decision-making.
        1. Inter-Agent Communication:
      • The LLM generates natural language statements to facilitate communication between different agents within the system, triggering new actions and responses based on the shared information.
        1. Environment Interaction:
      • The LLM translates reflections and environmental information into actionable plans, considering relationships between agents and observations to optimize both immediate and long-term actions.
        1. Proof-of-Concept Implementations:
      • Examples like AutoGPT, GPT-Engineer, and BabyAGI demonstrate the potential of LLM-powered autonomous agents, highlighting the integration of LLMs with other system components to handle complex tasks and improve over time through continuous learning and refinement.

      Together, these technologies enable LLM-powered autonomous agents to plan, learn, adapt, and interact effectively, supporting their function as powerful general problem solvers.

3. ํ•˜์œ„ ์งˆ๋ฌธ๋ณ„๋กœ ๋‹ต๋ณ€ ์ƒ์„ฑ ๋ฐ ์—ฐ์†์ ์ธ ์ฒ˜๋ฆฌ

  • (๊ณ ์ง‘) ํ•˜์ง€๋งŒ, ์›๋ณธ ์งˆ๋ฌธ์€ Key components๋ฅผ ๋ฌป๋Š”๊ฑฐ์˜€์œผ๋ฏ€๋กœ, essential technologies๋ผ๊ณ  ๋‹ตํ•˜๋Š” ๊ฒƒ์€ ๋งŒ์กฑ์Šค๋Ÿฝ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
    • ์ด์— ํ•œ๋ฒˆ ๋” ์›๋ณธ ์งˆ๋ฌธ์„ ์ฐธ์กฐํ•˜๋Š” ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋งŒ๋“ค์–ด์„œ ํ˜ธ์ถœํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# After processing sub-questions and accumulating q_a_pairs
final_prompt_template = """
You are a knowledgeable assistant.

Here is the original question:

{original_question}

Here are the relevant question and answer pairs that may help you:

{q_a_pairs}

Using the information above, please provide a detailed and comprehensive answer to the original question.
"""

final_prompt = ChatPromptTemplate.from_template(final_prompt_template)

# Reuse or initialize the LLM
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# Create the chain
final_chain = (
    final_prompt
    | llm
    | StrOutputParser()
)

# Invoke the chain to get the final answer
final_answer = final_chain.invoke({"original_question": question, "q_a_pairs": q_a_pairs})
print("Final Answer:\n", final_answer)

๋‹ต๋ณ€:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Final Answer:
# Main Components of an LLM-Powered Autonomous Agent System

- An LLM-powered autonomous agent system integrates a variety of technologies and modules to enable the agent to plan, learn, adapt, and interact with its environment and other agents effectively. Below are the main components of such a system:

## 1. Large Language Models (LLMs)
- Core Controller: The LLM acts as the brain of the system, driving the core functionalities. It understands, generates, and parses instructions and responses through natural language interactions.
  
## 2. Natural Language Interface
- Communication: This interface allows for natural language interactions between the LLM and external components such as memory systems and planning modules. It facilitates effective communication and information exchange within the system.

## 3. Planning
- Task Decomposition: Techniques like Chain of Thought (CoT) and Tree of Thoughts (ToT) are used to break down complex tasks into smaller, manageable subgoals. This helps in planning and executing tasks step-by-step.
- Reflection and Refinement: The agent can perform self-criticism and self-reflection over past actions, learning from mistakes, and refining its approach for future tasks. This continuous improvement enhances the quality and efficiency of the agent's outputs.

## 4. Memory Systems
- Finite Context Length Handling: Due to the finite context length limitation of LLMs, mechanisms such as vector stores and retrieval models are employed to access a larger knowledge pool.
- Retrieval Models: These models surface relevant context based on factors like recency, importance, and relevance to inform the agent's behavior and decision-making processes.
- Reflection Mechanism: This involves synthesizing memories into higher-level inferences that guide future behavior. It generates summaries of past events and uses them for better decision-making.

## 5. Inter-Agent Communication
- Natural Language Statements: The LLM generates natural language statements to facilitate communication between different agents within the system. This enables the sharing of information, triggering new actions and responses.

## 6. Environment Interaction
- Actionable Plans: The LLM translates reflections and environmental information into actionable plans. It takes into account the relationships between agents and observations to optimize both immediate and long-term actions.

## 7. Proof-of-Concept Implementations
- Examples: Implementations like AutoGPT, GPT-Engineer, and BabyAGI demonstrate the potential and capabilities of LLM-powered autonomous agents. These examples highlight the integration of LLMs with other system components to handle complex tasks and improve over time through continuous learning and refinement.

# Summary
An LLM-powered autonomous agent system is composed of several key components that work together to enable sophisticated functionalities. The Large Language Model (LLM) serves as the core controller, interfacing with other modules through a natural language interface. The planning module uses techniques like Task Decomposition and Reflection and Refinement to manage tasks efficiently. Memory systems overcome the finite context length of LLMs by employing vector stores and retrieval models, aiding in better decision-making. Inter-agent communication and environment interaction modules ensure seamless information exchange and actionable planning. Proof-of-concept implementations illustrate the practical applications and continuous improvement potential of these systems. Together, these components create a robust framework for autonomous agents capable of complex problem-solving and adaptive learning.

(์ฐธ๊ณ ) ๋ชจ๋“  ํ•˜์œ„ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€์„ ๊ฐœ๋ณ„์ ์œผ๋กœ ์ฒ˜๋ฆฌ

  • ์œ„์—์„œ๋Š” ์ˆœ์ฐจ์ ์œผ๋กœ ์งˆ๋ฌธ์— ๋Œ€ํ•ด์„œ ๋‹ต๋ณ€์„ ๊ฐ ํ•˜์œ„ ์งˆ๋ฌธ์— ๋Œ€ํ•ด ์ˆœ์ฐจ์ ์œผ๋กœ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜๊ณ  ์ด๋ฅผ ์ข…ํ•ฉํ•˜์—ฌ ์ตœ์ข… ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฐ˜๋ฉด์— ํ•ด๋‹น ์˜ˆ์‹œ๋Š” ๊ฐ๊ฐ ํ•˜์œ„ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€์„ ์ƒ์„ฑ ํ›„ ์ด๋ฅผ ์ข…ํ•ฉํ•˜์—ฌ ์ตœ์ข… ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
from langchain import hub
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# RAG ํ”„๋กฌํ”„ํŠธ
prompt_rag = hub.pull("rlm/rag-prompt")

def retrieve_and_rag(question, prompt_rag, sub_question_generator_chain):
    """ํ•˜์œ„ ์งˆ๋ฌธ์— ๋Œ€ํ•œ RAG ์ˆ˜ํ–‰"""
    sub_questions = sub_question_generator_chain.invoke({"question":question})
    rag_results = []

    for sub_question in sub_questions:
        retrieved_docs = retriever.get_relevant_documents(sub_question)
        answer = (prompt_rag | llm | StrOutputParser()).invoke({"context": retrieved_docs,
                                                                "question": sub_question})
        rag_results.append(answer)

    return rag_results, sub_questions

answers, questions = retrieve_and_rag(question, prompt_rag, generate_queries_decomposition)

def format_qa_pairs(questions, answers):
    """์งˆ๋ฌธ๊ณผ ๋‹ต๋ณ€์„ ํฌ๋งทํŒ…"""
    formatted_string = ""
    for i, (question, answer) in enumerate(zip(questions, answers), start=1):
        formatted_string += f"Question {i}: {question}\nAnswer {i}: {answer}\n\n"
    return formatted_string.strip()

context = format_qa_pairs(questions, answers)

# ์ตœ์ข… RAG ํ”„๋กฌํ”„ํŠธ
template = """Here is a set of Q+A pairs:

{context}

Use these to synthesize an answer to the question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"context":context,"question":question})

๋‹ต๋ณ€: ๊ด€์ ์ด ๋‹ค๋ฅด๊ฒŒ ๋‹ตํ•˜์ง€๋งŒ ์ „ ์ด์ „ ๋ฒ„์ „์˜ ๋Œ€๋‹ต์ด ์ข€ ๋” ์ œ ์ทจํ–ฅ์ธ๊ฑธ๋กœ๐Ÿ˜Ž

1
2
3
4
5
6
7
8
9
10
Question: 'What are the main components of an LLM-powered autonomous agent system?'

Answer:
An LLM-powered autonomous agent system is composed of several key components that work together to enable the agent's functionality. These include:

1. Hardware Components: The primary hardware components consist of the large language model (LLM) itself, memory storage for managing historical data and interactions, and processing units capable of handling the intensive computational demands required for planning and task decomposition.
2. Software Frameworks: Essential software frameworks for building such a system include AutoGPT, GPT-Engineer, and BabyAGI. These frameworks are proof-of-concept demos that illustrate how LLMs can serve as the core controller of autonomous agents, handling tasks such as planning, task decomposition, and self-reflection to continually improve their performance.
3. Natural Language Processing (NLP) Modules: NLP modules act as the interface between the LLM and other external components like memory and tools. They enable the agent to parse and understand model outputs, which is crucial for effective task decomposition, planning, and interaction with other system components. However, managing the reliability of these outputs is critical, as errors can impact the agent's performance.

Together, these hardware and software components form a cohesive system that supports the complex functionalities required for an autonomous agent to operate effectively.

์š”์•ฝ

  • Decomposition ๋ฐฉ์‹์€ ๋ณต์žกํ•œ ์งˆ๋ฌธ์„ ์—ฌ๋Ÿฌ ํ•˜์œ„ ์งˆ๋ฌธ์œผ๋กœ ๋ถ„ํ•ดํ•˜์—ฌ ๊ฐ๊ฐ์„ ๋…๋ฆฝ์ ์œผ๋กœ ํ•ด๊ฒฐํ•œ ํ›„, ์ตœ์ข… ๋‹ต๋ณ€์„ ์ œ๊ณตํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
  • ์ด ๊ณผ์ •์—์„œ ์ด์ „ ์งˆ๋ฌธ์˜ ๋‹ต๋ณ€์„ ๋‹ค์Œ ์งˆ๋ฌธ์— ํ™œ์šฉํ•˜์—ฌ ์ ์ง„์ ์œผ๋กœ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ์‹์ด ํ•ต์‹ฌ์ž…๋‹ˆ๋‹ค.
  • ์ด๋ฅผ ํ†ตํ•ด ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๊ณ , ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Part 8 (๋‹จ๊ณ„์  ํ›„ํ‡ด)

  • ์ด ๊ฐ•์˜๋Š” RAG(Retrieval-Augmented Generation) ํŒŒ์ดํ”„๋ผ์ธ์˜ โ€œQuery Translation(์ฟผ๋ฆฌ ๋ณ€ํ™˜)โ€ ์ค‘ ๋„ค ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์ธ Step Back(์Šคํ…๋ฐฑ) ํ”„๋กฌํ”„ํŒ…์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
  • Step Back ๊ธฐ๋ฒ•์€ ์งˆ๋ฌธ์„ ๋” ์ถ”์ƒ์ ์ธ ์ˆ˜์ค€์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๋ฌธ์„œ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

๋ฌธ์ œ ์ •์˜ ๋ฐ ์ ‘๊ทผ ๋ฐฉ๋ฒ•

  • ์ด์ „ ๊ธฐ๋ฒ•๋“ค์ธ Multi-query์™€ RAG Fusion์€ ์งˆ๋ฌธ์„ ์—ฌ๋Ÿฌ ๊ด€์ ์—์„œ ๋‹ค์‹œ ์“ฐ๊ฑฐ๋‚˜, ์งˆ๋ฌธ์„ ํ•˜์œ„ ๋ฌธ์ œ๋กœ ๋ถ„ํ•ดํ•˜์—ฌ ๊ฐ๊ฐ์„ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์— ์ค‘์ ์„ ๋‘์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ Step Back ๋ฐฉ์‹์€ ์งˆ๋ฌธ์„ ๋” ์ถ”์ƒ์ ์ธ ์งˆ๋ฌธ์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๊ณ ์ฐจ์›์ ์ธ ๊ฐœ๋…์„ ์ค‘์‹ฌ์œผ๋กœ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์ด ํŠน์ง•์ž…๋‹ˆ๋‹ค.
  • ๋…ผ๋ฌธ โ€œTake a Step Back: Evoking Reasoning via Abstraction in Large Language Modelsโ€์—์„œ๋Š” ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ(LLM)์˜ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ํ”„๋กฌํ”„ํŠธ ๊ธฐ๋ฒ•์ธ Step-Back Prompting์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
    • ์ถ”์ƒํ™”๋ฅผ ํ†ตํ•œ ์ถ”๋ก  ๊ฐœ์„ : Step-Back Prompting์€ ๋ชจ๋ธ์ด ๋ฌธ์ œ๋ฅผ ์ง์ ‘ ํ•ด๊ฒฐํ•˜๊ธฐ ์ „์—, ๋จผ์ € ๋ฌธ์ œ๋ฅผ ํ•œ ๋‹จ๊ณ„ ๋’ค๋กœ ๋ฌผ๋Ÿฌ๋‚˜์„œ ์ถ”์ƒํ™”๋œ ๊ณ ์ˆ˜์ค€ ๊ฐœ๋…์ด๋‚˜ ์›๋ฆฌ๋ฅผ ๋„์ถœํ•˜๋„๋ก ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ถ”์ƒํ™” ๋‹จ๊ณ„๋Š” ๋ชจ๋ธ์ด ๋ณต์žกํ•œ ๋ฌธ์ œ์—์„œ ์„ธ๋ถ€์ ์ธ ์˜ค๋ฅ˜๋ฅผ ์ค„์ด๊ณ  ๋” ๋†’์€ ์ •ํ™•๋„๋กœ ์ถ”๋ก ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
    • ๋‹ค์–‘ํ•œ ๋„๋ฉ”์ธ์—์˜ ์ ์šฉ: ์ด ๊ธฐ๋ฒ•์€ ๋ฌผ๋ฆฌํ•™, ํ™”ํ•™, ์‹œ๊ฐ„ ์ง€์‹ ์งˆ๋ฌธ(TimeQA), ๋‹ค๋‹จ๊ณ„ ์ถ”๋ก (Multi-Hop Reasoning) ๋“ฑ ๋‹ค์–‘ํ•œ ์ž‘์—…์— ์ ์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ํ•™์Šต๋œ ์›๋ฆฌ๋ฅผ ๋‹ค์–‘ํ•œ ์ƒํ™ฉ์— ์‘์šฉํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
    • ์„ฑ๋Šฅ ๋น„๊ต: Step-Back Prompting์€ ๋‹ค๋ฅธ ๊ธฐ๋ฒ•๋“ค, ํŠนํžˆ Chain-of-Thought(CoT)๋‚˜ Take-a-Deep-Breath(TDB) ํ”„๋กฌํ”„ํŠธ์™€ ๋น„๊ตํ•˜์—ฌ, ์ถ”๋ก  ์ž‘์—…์—์„œ ์ผ๊ด€๋˜๊ฒŒ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

  • Figure 2๋Š” ๋‘ ๊ฐ€์ง€ ์ž‘์—…(๋ฌผ๋ฆฌํ•™ ๋ฌธ์ œ์™€ ์‹œ๊ฐ„ ๊ธฐ๋ฐ˜ ์งˆ๋ฌธ)์— ๋Œ€ํ•ด Step-Back Prompting์„ ์–ด๋–ป๊ฒŒ ์ ์šฉํ•˜๋Š”์ง€๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๊ฐ๊ฐ์˜ ์˜ˆ์‹œ์—์„œ ๋ชจ๋ธ์€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด โ€œStep-Backโ€ ์งˆ๋ฌธ์„ ์ƒ์„ฑํ•˜๊ณ , ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ถ”๋ก ์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

    1. ๋ฌผ๋ฆฌํ•™ ๋ฌธ์ œ (MMLU ๋ฌผ๋ฆฌํ•™ ์˜ˆ์‹œ)

      ๋ฌธ์ œ: โ€œ์ด์ƒ ๊ธฐ์ฒด์˜ ์••๋ ฅ P๋Š” ์˜จ๋„๊ฐ€ 2๋ฐฐ๋กœ ์ฆ๊ฐ€ํ•˜๊ณ  ๋ถ€ํ”ผ๊ฐ€ 8๋ฐฐ๋กœ ์ฆ๊ฐ€ํ•˜๋ฉด ์–ด๋–ป๊ฒŒ ๋ณ€ํ•˜๋Š”๊ฐ€?โ€

      • ์›๋ž˜ ์ ‘๊ทผ ๋ฐฉ์‹: ๋ชจ๋ธ์ด ์ฒ˜์Œ์— ๋ฌธ์ œ๋ฅผ ์ง๊ด€์ ์œผ๋กœ ํ’€๋ ค๊ณ  ์‹œ๋„ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ Chain-of-Thought(CoT) ๋ฐฉ์‹์œผ๋กœ ์ค‘๊ฐ„ ๋‹จ๊ณ„์—์„œ ๋ช‡ ๊ฐ€์ง€ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
      • Step-Back Prompting์˜ ์ ์šฉ: Step-Back Prompting์€ ๋จผ์ € โ€œโ‘  ์ด ๋ฌธ์ œ์˜ ๊ธฐ๋ณธ ๋ฌผ๋ฆฌ ๋ฒ•์น™์€ ๋ฌด์—‡์ธ๊ฐ€?โ€œ๋ผ๋Š” ์ถ”์ƒ์ ์ธ ์งˆ๋ฌธ์„ ํ•˜๋„๋ก ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค.
        • ์ด ์งˆ๋ฌธ์„ ํ†ตํ•ด ๋ชจ๋ธ์€ โ‘ก ์ด์ƒ ๊ธฐ์ฒด ๋ฒ•์น™ (Ideal gas law, PV=nRT)์„ ํšŒ์ƒํ•˜๊ฒŒ ๋˜๊ณ , ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๊ณผ์ •์„ ์ด์–ด๊ฐ‘๋‹ˆ๋‹ค.
          1. ์ถ”์ƒํ™” ๋‹จ๊ณ„: โ€œ์ด์ƒ ๊ธฐ์ฒด ๋ฒ•์น™โ€์ด๋ผ๋Š” ๋ฌผ๋ฆฌํ•™์˜ ๊ธฐ๋ณธ ์›๋ฆฌ๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
          2. ์ถ”๋ก  ๋‹จ๊ณ„: ์ด์ƒ ๊ธฐ์ฒด ๋ฒ•์น™์„ ์ ์šฉํ•˜์—ฌ, ์˜จ๋„์™€ ๋ถ€ํ”ผ ๋ณ€ํ™”์— ๋”ฐ๋ฅธ ์••๋ ฅ ๋ณ€ํ™”๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ ์••๋ ฅ์€ 16๋ถ„์˜ 1๋กœ ์ค„์–ด๋“ญ๋‹ˆ๋‹ค.

      ์ด ๊ณผ์ •์—์„œ Step-Back Prompting์„ ํ†ตํ•ด ๋ชจ๋ธ์€ ์„ธ๋ถ€์ ์ธ ๊ณ„์‚ฐ์—์„œ ์˜ค๋ฅ˜๋ฅผ ํ”ผํ•˜๊ณ , ์ถ”์ƒ์ ์ธ ์›๋ฆฌ๋กœ๋ถ€ํ„ฐ ์˜ฌ๋ฐ”๋ฅธ ๋‹ต์„ ๋„์ถœํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

    2. ์‹œ๊ฐ„ ๊ธฐ๋ฐ˜ ์งˆ๋ฌธ (TimeQA ์˜ˆ์‹œ)

      ๋ฌธ์ œ: โ€œEstella Leopold๋Š” 1954๋…„ 8์›”์—์„œ 11์›” ์‚ฌ์ด์— ์–ด๋А ํ•™๊ต์— ๋‹ค๋…”๋Š”๊ฐ€?โ€

      • ์›๋ž˜ ์ ‘๊ทผ ๋ฐฉ์‹: ๋ชจ๋ธ์€ ์ฃผ์–ด์ง„ ํŠน์ • ์‹œ๊ฐ„ ๋ฒ”์œ„ ๋‚ด์—์„œ Estella Leopold์˜ ๊ต์œก ๊ธฐ๋ก์„ ๋ฐ”๋กœ ์ฐพ์œผ๋ ค๊ณ  ์‹œ๋„ํ•ฉ๋‹ˆ๋‹ค. CoT ๋ฐฉ์‹์œผ๋กœ ์ค‘๊ฐ„ ๋‹จ๊ณ„์—์„œ ์‹œ๊ฐ„ ๋ฒ”์œ„ ์ œํ•œ์œผ๋กœ ์ธํ•ด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
      • Step-Back Prompting์˜ ์ ์šฉ: Step-Back Prompting์€ ๋จผ์ € โ€œEstella Leopold์˜ ๊ต์œก ๊ธฐ๋ก์€ ๋ฌด์—‡์ธ๊ฐ€?โ€๋ผ๋Š” ๋ณด๋‹ค ์ถ”์ƒ์ ์ธ ์งˆ๋ฌธ์„ ์ƒ์„ฑํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์€ ๊ทธ๋…€์˜ ์ „๋ฐ˜์ ์ธ ๊ต์œก ๊ธฐ๋ก์„ ํšŒ์ƒํ•˜๊ณ , ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํŠน์ • ์‹œ๊ฐ„ ๋ฒ”์œ„์— ๋Œ€ํ•œ ์ •๋‹ต์„ ์ถ”๋ก ํ•ฉ๋‹ˆ๋‹ค.
        1. ์ถ”์ƒํ™” ๋‹จ๊ณ„: โ€œEstella Leopold์˜ ์ „๋ฐ˜์ ์ธ ๊ต์œก ์ด๋ ฅโ€์ด๋ผ๋Š” ๊ณ ์ˆ˜์ค€ ๊ฐœ๋…์„ ๋„์ถœํ•ฉ๋‹ˆ๋‹ค.
        2. ์ถ”๋ก  ๋‹จ๊ณ„: ์ด ์ถ”์ƒํ™”๋œ ๊ต์œก ์ด๋ ฅ์„ ๊ธฐ๋ฐ˜์œผ๋กœ, 1954๋…„ 8์›”๋ถ€ํ„ฐ 11์›”๊นŒ์ง€ ๊ทธ๋…€๊ฐ€ Yale University์—์„œ ๋ฐ•์‚ฌ ๊ณผ์ •์„ ๋ฐŸ๊ณ  ์žˆ์—ˆ๋‹ค๋Š” ๊ฒฐ๋ก ์„ ๋„์ถœํ•ฉ๋‹ˆ๋‹ค.

      ์ด ์˜ˆ์‹œ์—์„œ Step-Back Prompting์€ ์„ธ๋ถ€์ ์ธ ์‹œ๊ฐ„ ์ œํ•œ์—์„œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ์˜ค๋ฅ˜๋ฅผ ํ”ผํ•˜๊ณ , ๋ณด๋‹ค ๋„“์€ ๊ด€์ ์—์„œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค๋‹ˆ๋‹ค.

Step Back ๊ธฐ๋ฒ•์˜ ์ง๊ด€

  • Step Back ๋ฐฉ์‹์—์„œ๋Š” ๊ธฐ์กด ์งˆ๋ฌธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋” ์ถ”์ƒ์ ์ธ ์งˆ๋ฌธ์„ ์ƒ์„ฑํ•˜์—ฌ, ๋‘ ๊ฐ€์ง€ ์งˆ๋ฌธ(์›๋ž˜ ์งˆ๋ฌธ๊ณผ ์ถ”์ƒํ™”๋œ ์งˆ๋ฌธ)์„ ๋™์‹œ์— ๊ฒ€์ƒ‰ํ•˜๊ณ , ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ตœ์ข… ๋‹ต๋ณ€์„ ๋„์ถœํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋Š” ํŠนํžˆ ๊ฐœ๋…์ ์ธ ์ง€์‹์„ ๋ฐ”ํƒ•์œผ๋กœ ๊ฒ€์ƒ‰ํ•ด์•ผ ํ•˜๋Š” ๋„๋ฉ”์ธ์—์„œ ์œ ์šฉํ•˜๋ฉฐ, ๋ฌธ์„œ์˜ ๋‚ด์šฉ์ด ํŠน์ • ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต์„ ์ง์ ‘ ์ œ๊ณตํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ ๋” ์ผ๋ฐ˜์ ์ธ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•˜์—ฌ ๋ณด์™„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ฝ”๋“œ ์‹œ์—ฐ

1. Few-shot ์˜ˆ์‹œ ์ƒ์„ฑ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate

# fewshot
examples = [
    {
        "input": "Could the members of The Police perform lawful arrests?",
        "output": "What can the members of The Police do?",
    },
    {
        "input": "Jan Sindelโ€™s was born in what country?",
        "output": "What is Jan Sindelโ€™s personal history?",
    },
]

example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """
            You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. 
            Here are a few examples:
            """,
        ),
        few_shot_prompt,
        ("user", "{question}"),
    ]
)

generate_queries_step_back = prompt | ChatOpenAI(temperature=0) | StrOutputParser()
  • Step Back ํ”„๋กฌํ”„ํŒ…์„ ์œ„ํ•œ ์˜ˆ์‹œ(few-shot examples)๋ฅผ ์ œ๊ณตํ•˜์—ฌ, ๋ชจ๋ธ์ด ์›๋ž˜ ์งˆ๋ฌธ์„ ์–ด๋–ป๊ฒŒ ์ถ”์ƒ์ ์ธ ์งˆ๋ฌธ์œผ๋กœ ๋ณ€ํ™˜ํ• ์ง€ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค.
  • ์›๋ž˜ ์งˆ๋ฌธ(original question)์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ถ”์ƒ์ ์ธ ์งˆ๋ฌธ์„ ์ƒ์„ฑํ•˜๋Š” ํ”„๋กฌํ”„ํŠธ(*๋…ผ๋ฌธ ์‚ฌ์šฉ ํ”„๋กฌํ”„ํŠธ)๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

2. Step Back ์งˆ๋ฌธ ์ƒ์„ฑ ๋ฐ ๊ฒ€์ƒ‰ & ๋‹ต๋ณ€ ์ƒ์„ฑ

1
2
question = "What is task decomposition for LLM agents?"
generate_queries_step_back.invoke({"question": question})
  • โ€œWhat is task decomposition for LLM agents?โ€๋ผ๋Š” question์˜ step-back question์œผ๋กœ ๋‹ค์Œ ๋‹ต๋ณ€์ด ๋‚˜์˜ด.
    • 'How do LLM agents handle complex tasks?'
    • ์ด ์งˆ๋ฌธ์„ ํ†ตํ•ด LLM ์—์ด์ „ํŠธ๊ฐ€ ๋ณต์žกํ•œ ์ž‘์—…์„ ํ•ด๊ฒฐํ•˜๋Š” ์ „๋ฐ˜์ ์ธ ์ „๋žต์ด๋‚˜ ๋ฐฉ์‹์„ ๋ฌป๊ณ , ๊ตฌ์ฒด์ ์ธ ์„ธ๋ถ€ ์‚ฌํ•ญ๋ณด๋‹ค๋Š” ๊ณ ์ˆ˜์ค€์˜ ๊ฐœ๋…์„ ์ดํ•ดํ•˜๋ ค๋Š” ๋ชฉ์ ์„ ๋‹ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
    • ์ด ๋ฐฉ์‹์œผ๋กœ Step-Back ์งˆ๋ฌธ์„ ์ ์šฉํ•˜๋ฉด ์„ธ๋ถ€์ ์ธ ์ž‘์—… ๋ถ„ํ•ด(task decomposition)์˜ ๊ตฌ์ฒด์ ์ธ ๋ฐฉ๋ฒ•๋ณด๋‹ค๋Š”, ์ „์ฒด์ ์ธ ์ž‘์—… ์ฒ˜๋ฆฌ ๋ฐฉ์‹์— ๋Œ€ํ•œ ๋‹ต๋ณ€์„ ๋„์ถœํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Šต๋‹ˆ๋‹ค.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
response_prompt_template = """
You are an expert of world knowledge. I am going to ask you a question. 
Your response should be comprehensive and not contradicted with the following context if they are relevant. 
Otherwise, ignore them if they are not relevant.

# {normal_context}
# {step_back_context}

# Original Question: {question}
# Answer:
"""

response_prompt = ChatPromptTemplate.from_template(response_prompt_template)

chain = (
    {
        # ์›๋ž˜ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๊ฒ€์ƒ‰
        "normal_context": RunnableLambda(lambda x: x["question"]) | retriever,
        # Step Back ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๊ฒ€์ƒ‰
        "step_back_context": generate_queries_step_back | retriever,
        "question": lambda x: x["question"],
    }
    | response_prompt
    | ChatOpenAI(temperature=0)
    | StrOutputParser()
)

chain.invoke({"question": question})

  • ์›๋ž˜ ์งˆ๋ฌธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ถ”์ƒ์ ์ธ ์งˆ๋ฌธ์„ ์ƒ์„ฑํ•œ ํ›„, ๊ทธ ์งˆ๋ฌธ์„ ๋ฐ”ํƒ•์œผ๋กœ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • ์›๋ž˜ ์งˆ๋ฌธ๊ณผ Step Back ์งˆ๋ฌธ์„ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ๊ฐ์˜ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ํ•ฉ์น˜๊ณ , ์ตœ์ข… ๋‹ต๋ณ€์„ ๋„์ถœํ•˜๋Š” ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

์š”์•ฝ

  • Step Back(์Šคํ…๋ฐฑ) ๋ฐฉ์‹์€ ์›๋ž˜ ์งˆ๋ฌธ์„ ๋” ์ถ”์ƒ์ ์ธ ์ˆ˜์ค€์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
  • ์ด ๊ธฐ๋ฒ•์€ ์›๋ž˜ ์งˆ๋ฌธ์ด ๋„ˆ๋ฌด ๊ตฌ์ฒด์ ์ผ ๋•Œ, ๋” ์ผ๋ฐ˜์ ์ธ ์งˆ๋ฌธ์„ ์ƒ์„ฑํ•˜์—ฌ ๋” ๋„“์€ ๋ฒ”์œ„์˜ ์ •๋ณด๋ฅผ ๊ฒ€์ƒ‰ํ•˜๊ณ , ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ตœ์ข… ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ํŠนํžˆ, ๊ฐœ๋…์ ์ธ ์ง€์‹์„ ๋ฐ”ํƒ•์œผ๋กœ ๊ฒ€์ƒ‰์ด ํ•„์š”ํ•œ ๋„๋ฉ”์ธ์—์„œ ์œ ์šฉํ•˜๋ฉฐ, ๋ฌธ์„œ์˜ ๊ตฌ์กฐ๊ฐ€ ๊ฐœ๋…์  ๋‚ด์šฉ๊ณผ ๊ตฌ์ฒด์  ๋‚ด์šฉ์œผ๋กœ ๋‚˜๋‰˜๋Š” ๊ฒฝ์šฐ ํšจ๊ณผ์ ์ž…๋‹ˆ๋‹ค.

Part 9 (HyDE)

  • ์ด ๊ฐ•์˜๋Š” RAG(Retrieval-Augmented Generation) ํŒŒ์ดํ”„๋ผ์ธ์˜ โ€œQuery Translation(์ฟผ๋ฆฌ ๋ณ€ํ™˜)โ€ ์ค‘ ๋‹ค์„ฏ ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์ธ HyDE (Hypothetical Document Embeddings)์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

  • ํ•ด๋‹น ๋…ผ๋ฌธ์€ โ€œPrecise Zero-Shot Dense Retrieval without Relevance Labelsโ€๋ผ๋Š” ์ œ๋ชฉ์„ ๊ฐ€์ง„ ๋…ผ๋ฌธ์œผ๋กœ, ํ•ต์‹ฌ์ ์œผ๋กœ Hypothetical Document Embeddings (HyDE) ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
    • HyDE๋Š” ์ฟผ๋ฆฌ์™€ ๊ด€๋ จ๋œ โ€œ๊ฐ€์ƒ ๋ฌธ์„œ(Hypothetical Document)โ€๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๋ฌธ์„œ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด ๊ฐ€์ƒ ๋ฌธ์„œ๋Š” ์‹ค์ œ๋กœ ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๋ฌธ์„œ๋กœ, ํ•™์Šต๋œ ์–ธ์–ด ๋ชจ๋ธ(์˜ˆ: InstructGPT)์„ ํ†ตํ•ด ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.
  • HyDE์˜ ์ฃผ์š” ๋‹จ๊ณ„๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:
    1. ๊ฐ€์ƒ ๋ฌธ์„œ ์ƒ์„ฑ: ์ฟผ๋ฆฌ๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ์–ธ์–ด ๋ชจ๋ธ์ด ํ•ด๋‹น ์ฟผ๋ฆฌ์— ๋‹ต๋ณ€ํ•˜๋Š” ๊ฐ€์ƒ์˜ ๋ฌธ์„œ๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฐ€์ƒ ๋ฌธ์„œ๋Š” ์ฟผ๋ฆฌ์™€ ๊ด€๋ จ๋œ ๋‚ด์šฉ์„ ๋‹ด๊ณ  ์žˆ์ง€๋งŒ, ์‚ฌ์‹ค์ด ์•„๋‹ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
    2. ๋ฌธ์„œ ์ž„๋ฒ ๋”ฉ: ๊ฐ€์ƒ ๋ฌธ์„œ๋Š” ๋Œ€์กฐ ํ•™์Šต์„ ๊ฑฐ์นœ ์ธ์ฝ”๋”(์˜ˆ: Contriever)๋ฅผ ํ†ตํ•ด ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค. ์ด ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋Š” ๊ฐ€์ƒ ๋ฌธ์„œ์—์„œ ๋ถˆํ•„์š”ํ•œ ์„ธ๋ถ€ ์‚ฌํ•ญ์„ ๊ฑธ๋Ÿฌ๋‚ด๊ณ , ์ฟผ๋ฆฌ์™€ ๊ด€๋ จ๋œ ์‹ค์ œ ๋ฌธ์„œ๋“ค์„ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋•์Šต๋‹ˆ๋‹ค.
    3. ๋ฌธ์„œ ๊ฒ€์ƒ‰: ์ตœ์ข…์ ์œผ๋กœ, ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ์ด์šฉํ•ด ์ฝ”ํผ์Šค ๋‚ด์˜ ์‹ค์ œ ๋ฌธ์„œ๋“ค๊ณผ์˜ ๋ฒกํ„ฐ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ๊ฐ€์žฅ ์œ ์‚ฌํ•œ ๋ฌธ์„œ๋“ค์„ ๊ฒ€์ƒ‰ํ•ฉ๋‹ˆ๋‹ค.

๋ฌธ์ œ ์ •์˜ ๋ฐ ์ ‘๊ทผ ๋ฐฉ๋ฒ•

  • RAG์˜ ๊ธฐ๋ณธ ํ๋ฆ„์—์„œ๋Š” ์งˆ๋ฌธ๊ณผ ๋ฌธ์„œ๋ฅผ ์ž„๋ฒ ๋”ฉํ•˜์—ฌ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์—์„œ์˜ ์œ ์‚ฌ์„ฑ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•ฉ๋‹ˆ๋‹ค.
    • ํ•˜์ง€๋งŒ ์งˆ๋ฌธ๊ณผ ๋ฌธ์„œ๋Š” ๋งค์šฐ ๋‹ค๋ฅธ ์œ ํ˜•์˜ ํ…์ŠคํŠธ ๊ฐ์ฒด์ž…๋‹ˆ๋‹ค.
      • Why โ‡’ ? ๋ฌธ์„œ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๊ธธ๊ณ  ๋ฐ€๋„๊ฐ€ ๋†’์€ ์ •๋ณด๋กœ ๊ตฌ์„ฑ๋œ ๋ฐ˜๋ฉด, ์งˆ๋ฌธ์€ ์งง๊ณ  ์‚ฌ์šฉ์ž์— ์˜ํ•ด ๋น„๊ตฌ์กฐ์ ์œผ๋กœ ์ž‘์„ฑ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • HyDE์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ์งˆ๋ฌธ์„ ๊ฐ€์ƒ์˜ ๋ฌธ์„œ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๋ฌธ์„œ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์— ๋งคํ•‘ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. (์•„๋ž˜ ๊ทธ๋ฆผ ์ฐธ๊ณ )

  • ์ฆ‰, ๊ธฐ์กด์ฒ˜๋Ÿผ ์งˆ๋ฌธ์„ ๋ฐ”๋กœ ์ž„๋ฐฐ๋”ฉ์„ ์‹œ์ผœ์„œ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋Œ€์‹ , ๊ฐ€์ƒ์˜ ๋ฌธ์„œ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๊ฒ€์ƒ‰์— ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
    • ์ด ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์งˆ๋ฌธ์˜ ์ž„๋ฒ ๋”ฉ์ด ๋ถ€์กฑํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ์—๋„ ๊ฐ€์ƒ์˜ ๋ฌธ์„œ๊ฐ€ ๋” ์œ ์‚ฌํ•œ ์‹ค์ œ ๋ฌธ์„œ์™€ ์ž˜ ์ผ์น˜ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
    • ์™œ ๊ฐ€์ƒ์˜ ๋ฌธ์„œ๊ฐ€ ์งˆ๋ฌธ๋ณด๋‹ค ๋” ์›๋ณธ ๋ฌธ์„œ์™€ ๊ฐ€๊นŒ์›Œ์งˆ ์ˆ˜ ์žˆ๋Š”๊ฐ€?
      • ๊ฐ€์ƒ์˜ ๋ฌธ์„œ๋Š” ์›๋ž˜ ์งˆ๋ฌธ๋ณด๋‹ค ๋” ๋งŽ์€ ์ •๋ณด์™€ ๋งฅ๋ฝ์„ ํฌํ•จํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๊ด€๋ จ๋œ ์‹ค์ œ ๋ฌธ์„œ์™€ ๋” ๊ฐ€๊นŒ์šด ๋ฒกํ„ฐ ๊ณต๊ฐ„์— ์œ„์น˜ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
      • ์ด๋Š” ๊ฐ€์ƒ ๋ฌธ์„œ๊ฐ€ ์งˆ๋ฌธ์—์„œ ๋ถ€์กฑํ•œ ๋ถ€๋ถ„์„ ๋ณด์™„ํ•˜๊ณ , ์–ธ์–ด ๋ชจ๋ธ์˜ ํ•™์Šต๋œ ํŒจํ„ด์„ ํ™œ์šฉํ•ด ๋” ํ’๋ถ€ํ•œ ๊ด€๋ จ ์ •๋ณด๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
      • ์ด๋กœ ์ธํ•ด ๊ฐ€์ƒ ๋ฌธ์„œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์›๋ž˜ ์งˆ๋ฌธ์„ ์ง์ ‘ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ๋” ๋‚˜์€ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ฝ”๋“œ ์‹œ์—ฐ

1. HyDE๋ฅผ ์œ„ํ•œ ๋ฌธ์„œ ์ƒ์„ฑ ํ”„๋กฌํ”„ํŠธ ์ •์˜ ๋ฐ ์ƒ์„ฑ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
from langchain.prompts import ChatPromptTemplate

template = """
Please write a scientific paper passage to answer the question
Question: {question}
Passage:
"""

prompt_hyde = ChatPromptTemplate.from_template(template)

from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

generate_docs_for_retrieval = (
    prompt_hyde | ChatOpenAI(temperature=0) | StrOutputParser()
)

# ์˜ˆ์‹œ ์งˆ๋ฌธ
question = "What is task decomposition for LLM agents?"

generate_docs_for_retrieval.invoke({"question":question})

  • ์งˆ๋ฌธ์„ ๋ฐ”ํƒ•์œผ๋กœ ๊ฐ€์ƒ์˜ ๊ณผํ•™์  ๋ฌธ์„œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
  • ์›๋ž˜ ์งˆ๋ฌธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ€์ƒ์˜ ๋ฌธ์„œ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์—„์ฒญ ๊ทธ๋Ÿด ๋“ฏํ•˜๊ฒŒ ๋…ผ๋ฌธ ๊ตฌ์กฐ๋กœ ์ž‘์„ฑ๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Title: Task Decomposition for Large Language Model (LLM) Agents

Abstract:
Task decomposition for Large Language Model (LLM) agents refers to the systematic process of breaking down complex tasks into smaller, more manageable subtasks, which can be sequentially or concurrently addressed by the model. This methodology aims to enhance the efficiency, accuracy, and overall performance of LLMs when faced with multifaceted queries or tasks. This passage explores the principles, methodologies, and implications of task decomposition in the context of LLM agents.

Introduction:
Large Language Models (LLMs), such as GPT-4, have demonstrated remarkable capabilities in natural language understanding, generation, and various other language-related tasks. However, their performance can be significantly improved through the strategic application of task decomposition. By dividing a complex task into discrete, manageable components, LLM agents can process information more effectively, reduce cognitive load, and minimize errors.

Principles of Task Decomposition:
Task decomposition is grounded in several key principles:

1. Modularity: Breaking down a task into independent or semi-independent modules allows for parallel processing and simplifies error identification and correction.
2. Hierarchy: Establishing a hierarchical structure where higher-level tasks are decomposed into lower-level subtasks ensures a coherent and organized approach to problem-solving.
3. Sequential Dependency: Understanding the dependencies between subtasks enables the LLM to process them in the correct order, ensuring that intermediate results are correctly utilized in subsequent steps.

Methodologies:
There are various methodologies for task decomposition, each tailored to specific types of tasks and LLM capabilities:

1. Top-Down Decomposition: This approach begins with the overarching task and progressively breaks it down into smaller subtasks. For example, answering a complex question might involve identifying key concepts, gathering relevant information, synthesizing data, and constructing a coherent response.
2. Bottom-Up Decomposition: Conversely, this method starts with identifying fundamental subtasks and gradually combines them to form a solution to the larger task. This can be useful in tasks where the basic components are well understood, but their integration is complex.
3. Hybrid Decomposition: Combining top-down and bottom-up approaches can provide a balanced strategy, leveraging the strengths of both methods to handle diverse tasks effectively.

Implications for LLM Performance:
The adoption of task decomposition has several implications for the performance of LLM agents:

1. Enhanced Accuracy: By focusing on smaller, more manageable subtasks, LLMs can provide more precise and accurate responses, reducing the likelihood of errors that may occur when tackling complex tasks holistically.
2. Improved Efficiency: Decomposing tasks allows for parallel processing, which can significantly speed up task completion and optimize resource utilization.
3. Scalability: Task decomposition facilitates the scaling of LLM applications to handle increasingly complex and diverse tasks, making them more versatile and robust.

Conclusion:
Task decomposition is a vital strategy for optimizing the performance of LLM agents. By breaking down complex tasks into smaller, manageable components, LLMs can improve their accuracy, efficiency, and scalability. As LLM technology continues to evolve, the principles and methodologies of task decomposition will play an increasingly important role in harnessing the full potential of these powerful models.

Keywords: Task decomposition, Large Language Models, LLM agents, modularity, hierarchical structure, sequential dependency, top-down decomposition, bottom-up decomposition, hybrid decomposition.

2. ์ƒ์„ฑ๋œ ๊ฐ€์ƒ ๋ฌธ์„œ๋ฅผ ์‚ฌ์šฉํ•œ ๋ฌธ์„œ ๊ฒ€์ƒ‰

1
2
3
4
5
# ๊ฒ€์ƒ‰ ์ฒด์ธ
retrieval_chain = generate_docs_for_retrieval | retriever
retireved_docs = retrieval_chain.invoke({"question":question})
retireved_docs

  • ์ƒ์„ฑ๋œ ๊ฐ€์ƒ์˜ ๋ฌธ์„œ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํ•ด๋‹น ๋ฌธ์„œ์™€ ๊ด€๋ จ์ด ๋†’์€ ๋ฌธ์„œ๋“ค์„ ๊ฒ€์ƒ‰ํ•ฉ๋‹ˆ๋‹ค.

3. ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ตœ์ข… ๋‹ต๋ณ€ ์ƒ์„ฑ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# RAG ํ”„๋กฌํ”„ํŠธ
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

# ์ตœ์ข… RAG ์ฒด์ธ ์‹คํ–‰
final_rag_chain.invoke({"context":retireved_docs,"question":question})

  • ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๋“ค์„ ๋ฐ”ํƒ•์œผ๋กœ ์›๋ž˜ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ์ตœ์ข… ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

์š”์•ฝ

  • HyDE(Hypothetical Document Embeddings)๋Š” ์งˆ๋ฌธ์„ ๊ฐ€์ƒ์˜ ๋ฌธ์„œ๋กœ ๋ณ€ํ™˜ํ•œ ํ›„, ํ•ด๋‹น ๋ฌธ์„œ๋ฅผ ์ด์šฉํ•ด ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
  • ์งˆ๋ฌธ์ด ์ง์ ‘ ๊ฒ€์ƒ‰์— ์ ํ•ฉํ•˜์ง€ ์•Š์„ ๋•Œ, ๊ฐ€์ƒ์˜ ๋ฌธ์„œ๊ฐ€ ๋ฌธ์„œ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์—์„œ ๋” ์œ ์‚ฌํ•œ ๋ฌธ์„œ๋ฅผ ์ฐพ๋Š” ๋ฐ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ด ๋ฐฉ์‹์€ ํŠนํžˆ ์งˆ๋ฌธ์ด ์งง๊ฑฐ๋‚˜ ๊ตฌ์กฐ๊ฐ€ ๋ช…ํ™•ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ์— ์œ ์šฉํ•˜๋ฉฐ, ๋„๋ฉ”์ธ์— ๋งž๊ฒŒ ๊ฐ€์ƒ ๋ฌธ์„œ ์ƒ์„ฑ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.



-->