OpenAI, GPTβ4o μ΄λ―Έμ§ μμ± κΈ°λ₯ μκ° β AI μ΄λ―Έμ§ μμ±μ μλ‘μ΄ μλ
μλ³Έ κ²μκΈ: https://velog.io/@euisuk-chung/GPT4o-μ΄λ―Έμ§-μμ±-κΈ°λ₯-μκ°-AI-μ΄λ―Έμ§-μμ±μ-μλ‘μ΄-μλ
μλ νμΈμ! μ€λ λμ¨ κΈ°λ₯μ μ κ° μ λ§μ λ§ κΈ°λ€λ Έλ κΈ°λ₯μΈλ°μ!! π
μ°Έκ³ λ§ν¬
2025λ 3μ 25μΌ, OpenAIλ GPTβ4o λͺ¨λΈμ ν΅ν΄ λ€μ΄ν°λΈ μ΄λ―Έμ§ μμ±(Native Image Generation) κΈ°λ₯μ μ μ μΆμνμ΅λλ€. μ΄λ² λ°νλ λ¨μν κΈ°λ₯ κ°μ μ λμ΄, AI μ΄λ―Έμ§ μμ±μ΄ ν μ€νΈμ μ΄λ―Έμ§κ° μμ ν ν΅ν©λ μ΄λλͺ¨λ¬(Omnimodal) κ²½νμΌλ‘ μ§ννλ€λ μ μμ μ£Όλͺ©ν λ§ν©λλ€.
λ°λ‘λ°λ‘ μ€λμ μΈλ€μΌμ²λΌ βλ΄κ° μ§μ λ£μ΄μ£Όκ±°λ, μ¬μ§μ μ€νμΌμ λ°κΏλ³΄κ±°λ, κ·Έλ¦Όμ ν μ€νΈλ₯Ό μ ννκ² λ£λ κ±°βκ° λλμ΄ κ°λ₯ν κ²μΌλ‘ 보μ λλ€!!
π μ¬μ€ μ΄κ±°, μμ μλ λ무 νκ³ μΆμμ§λ§ λ νκ³κ° μμμ£ .
- νΉν ν μ€νΈμ μλ¬Έμ/λλ¬Έμ ꡬλΆλ μ νν μ λκ³ , μ΄λ―Έμ§μμ μμ±λ κΈμ¨κ° κΉ¨μ§λ κ²μ΄ μμ¬μ λλ°μβ¦
μ΄μ λ μλ λ΄μ€ κΈ°μ¬λ€μ²λΌ λ무 νΈμ§μ΄ μλμ΄μ μ μκΆ μ΄μ μκΈ°κ° λ μ λλ‘ λ¬Έμ κ° λκ³ μμ΅λλ€.
- μ°ν©λ΄μ€, μ€νAI μ΄λ―Έμ§ μμ± λͺ¨λΈ, μ€νλμ€ μ§λΈλ¦¬ μ μκΆ μΉ¨ν΄ λ Όλ
- μ‘°μ μΌλ³΄, μ§λΈλ¦¬Β·λμ¦λ μλλ©΄ λ½λ‘λ‘ν? μ±GPT μ μ΄λ―Έμ§ μμ±κΈ° μ μκΆ λ Όλ
λ°λ‘ μ΄λ―Έμ§ μμ±ν΄λ³΄λ 1μΈ γ γ πΌοΈ
μμ1. μ λͺ ν Meme νΈμ§νκΈ° - SH** UP AND TAKE MY MONEY!!
μμ2. νΈλΌν μ΄λ―Έμ§ νΈμ§νκΈ° - μ μ‘°μ μΌλ³΄ κΈ°μ¬ μ°Έκ³
μ΄λ²μ μ
λ°μ΄νΈ ν GPT-4o μ΄λ―Έμ§ μμ± κΈ°λ₯
μμλ κ·Έ λͺ¨λ λΆλΆμ΄ μ λ§ κΈ°λ μ΄μμΌλ‘ κ°μ λ κ² κ°μ λ무 μ€λ μ! π (μλλ μ’ λλ €μ‘μ§λ§)
π κ·ΈλΌ μ΄λ€ λΆλΆμ΄ μ’μμ‘λμ§ λ³Έκ²©μ μΌλ‘ νλ² μ΄ν΄λ³ΌκΉμ?
OpenAIλ GPTβ4oμ κ°μ₯ μ§λ³΄λ μ΄λ―Έμ§ μμ±κΈ°λ₯Ό ν΅ν©νλ©° μ΄λ―Έμ§ μμ±μ λ°©ν₯μ βμμ κ·Έλ¦Όβμμ βμΈλͺ¨ μλ λꡬβλ‘ μμ ν μ ννλ€κ³ μ€λͺ νμ΅λλ€.
βWeβve built our most advanced image generator yet into GPTβ4o. The resultβimage generation that is not only beautiful, but useful.β - OpenAI, March 25, 2025
μ΄μ μλ λ¨μν λΉμ£ΌμΌμ΄ λ©μ§ μ΄λ―Έμ§λ₯Ό μμ±νλ κ²μ΄ λͺ©νμλ€λ©΄, μ΄μ λ μ€μ μμ μ νμ©ν μ μλ μ μ€μΌμ΄μ€ μ€μ¬μΌλ‘ λ°μ νμμ΅λλ€.
βΉοΈ μ κ· κΈ°λ₯ μμ½
-
π¨ λ©ν°ν΄ μ΄λ―Έμ§ μμ± (Multi-turn Generation)
μ±ν μ μ΄μ΄κ°λ©° μ΄λ―Έμ§ μμ /κ°μ κ°λ₯. μΊλ¦ν° λμμΈ, μ₯λ©΄ ꡬμ±μ μ 리.
-
π μ λ°ν μΈμ€νΈλμ λ°μ
ν μ€νΈ κΈ°λ° λ³΅μ‘ν μ§μμ¬ν (10~20κ° κ°μ²΄ λ°°μΉ λ±)λ μΆ©μ€ν λ°μ.
-
π§ μΈμ»¨ν μ€νΈ νμ΅ (In-context Learning)
μ λ‘λλ μ΄λ―Έμ§μμ μ€νμΌ/κ΅¬μ± νμ΅ ν μλ‘μ΄ μ΄λ―Έμ§ μμ±.
-
π μλ μ§μ μ°κ²° (World Knowledge)
ν μ€νΈ λͺ¨λΈμ μ§μμ νμ©ν΄ μ΄λ―Έμ§λ₯Ό λ λλνκ² μμ±.
-
βοΈ μ νν ν μ€νΈ λ λλ§ (Text Rendering)
νμ§ν, λ©λ΄ν, μΈν¬κ·Έλν½ λ± ν μ€νΈ ν¬ν¨ μ΄λ―Έμ§λ μ ννκ³ κΉλνκ² μμ±.
-
π λ€μν μ€νμΌ & ν¬ν 리μΌλ¦¬μ¦
λ§ν, μμ±ν, λμ§νΈ νμΈν , μ€μ¬ μ€νμΌ λ± νλμ μ€νμΌ μ§μ.
λ³Έ λΈλ‘κ·Έ ν¬μ€νΈλ OpenAI Youtube Demo(Part1)μ OpenAI Blog λ΄μ©(Part2)λ‘ λλ©λλ€.
PART 1. OpenAI Youtube Demo μκ°
λ°νλ μ€μ λ°λͺ¨ μμ° μ€μ¬μΌλ‘ μ΄λ£¨μ΄μ‘μΌλ©°, κ° μΈμ μ΄ μ κ΅ν ν μ€νΈ λ λλ§, μμ± κ²°ν©μ μ λ°λ, λ°(meme) μ΄λ―Έμ§ μμ±μ μ μ°μ±, λ€μ€ λͺ¨λ¬ μ λ ₯ νμ© λ±μ λ¨κ³λ³λ‘ 보μ¬μ£Όμμ΅λλ€.
π¬ μ΄λ―Έμ§ μμ± νμ§μ μ§ν
λ°ν μ΄λ°μ βGPTβ4oμ μ΄λ―Έμ§ μμ± νμ§μ κ³Όκ±°μ λͺ¨λΈκ³Όλ μ°¨μμ΄ λ€λ₯΄λ€βκ³ κ°μ‘°νμ΅λλ€.
μ€μ λ‘ μλμ κ°μ μ΄λ―Έμ§ μμ± λ°λͺ¨λ₯Ό ν΅ν΄ λ€μμ 보μ¬μ£Όμμ΅λλ€.
π· DEMO 1. POV(1μΈμΉ μμ )μ μ΄λ―Έμ§ μμ²:
μ¬μ©μμ μμ μμ μ’ μ΄ μ λ°ν λ ΈνΈκ° μκ³ , λ°°κ²½μλ 촬μ νμ΄ μλ μ₯λ©΄. μ΄ μ΄λ―Έμ§μμ GPTβ4oλ λ€μκ³Ό κ°μ μΈλΆ μ¬νμ μ ννκ² λ°μνμ΅λλ€:
- λ°°κ²½μ νλ¦Ώνκ² μ²λ¦¬λκ³ (depth-of-field νν)
- μ’ μ΄ μ ν μ€νΈλ μ νν λ λλ§λ¨
- βSPEAKER NOTESβ, βPART 1β, βPART 2β λ± λ¬Έκ΅¬κ° μ€νμ μμ΄ ννλ¨
μ§μ λ§λ€μ΄λ μ λλ‘ λμ€λ κ²μ λ³Ό μ μμ
π μ¬μ© ν둬ννΈ(ENG):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
A tall, POV image of me in an old loft. Film crew present, facing me.
There is a sheet of paper on the table. There is text written on the paper (focus on paper, background out of focus. Paper occupies most of the page)
The text reads:
**SPEAKER NOTES:**
**PART 1**
- native support for image generation in a model as powerful as GPT-4
- render full paragraphs of text and combine images
- "rough around the edges"
- make it accessible
**PART 2**
- memes?
- "we're surrounded by images"
- images that "persuade, inform and educate".
- "workhorse images"
- gives the power of useful image generation to the world
π ν둬ννΈ νΉμ§ μ 리
μ΄λ―Έμ§ μμ± μμ² ν둬ννΈλ λ€μκ³Ό κ°μ 3λ¨ κ΅¬μ±
μ λ°λ¦
λλ€:
-
μ₯λ©΄ μ€μ (Scene Setting)
- 곡κ°(λ‘μ λ‘ννΈ)κ³Ό μμ (POV, λμ μμ )μ λͺ νν μ€μ
- μΈλ¬Ό ꡬμ±(촬μν, λλ₯Ό λ°λΌλ³΄λ μν)
-
μ΄μ λμ μ€μ (Focal Object)
- μ΄λ―Έμ§μ μ€μ¬: μ± μ μ μ’ μ΄
- βfocus on paperβ, βbackground out of focusβλ Depth-of-Field(μ¬λ νν)μ λͺ μν¨
-
ν μ€νΈ μ½μ μ§μ (Embedded Text Instruction)
- μ’ μ΄μ μ ν λ΄μ©μ κ·Έλλ‘ λͺ μ (βThe text reads:β μ΄νμ λ¬Έμ₯λ€)
μ ν둬ννΈλ₯Ό μ§μν΄μ νκΈλ‘ μμ²νλ©΄, μμ§κΉμ§ νκΈμ μλ²½(?)νμ§ μμ κ²μ λ³Ό μ μμ΅λλ€.
μλλ μ μ΄λ―Έμ§λ₯Ό λ§λ€κΈ° μν΄ μ¬μ©ν μλ¬Έ ν둬ννΈλ₯Ό μ§μν νκΈ ν둬ννΈ λ΄μ©μ λλ€.
π μ¬μ© ν둬ννΈ(KOR):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
λ‘μ λ‘ννΈ μ. μΉ΄λ©λΌ μμ μ μΈλ¬Όμ μμ μμ λ°λΌλ³΄λ λ―ν Tall, POV μ΄λ―Έμ§.
λ΄ μμ μν 촬μ νμ΄ μμΌλ©°, λλ₯Ό λ°λΌλ³΄κ³ μλ€.
μ±
μ μμ μ’
μ΄ ν μ₯μ΄ λμ¬ μκ³ , κ·Έ μ’
μ΄μ κΈμ΄ μ ν μλ€.
**μ΄λ―Έμ§μ μ΄μ μ μ’
μ΄μ λ§μΆ°μ Έ μμΌλ©°**, λ°°κ²½μ νλ¦Ώνκ² μ²λ¦¬λμ΄ μλ€.
μ’
μ΄κ° μ΄λ―Έμ§μ λλΆλΆμ μ°¨μ§νλ€.
μ’
μ΄μλ λ€μκ³Ό κ°μ λ΄μ©μ΄ μ ν μλ€:
**λ°νμ λ
ΈνΈ:**
**1λΆ**
- GPT-4μ²λΌ κ°λ ₯ν λͺ¨λΈμ΄ μ΄λ―Έμ§ μμ±μ κΈ°λ³ΈμΌλ‘ μ§μ
- κΈ΄ λ¬Έλ¨μ ν
μ€νΈλ₯Ό λ λλ§νκ³ μ΄λ―Έμ§λ₯Ό κ²°ν©
- "μ‘°κΈμ μ‘°μ
νλ€"
- λꡬλ μ½κ² μ κ·Ό κ°λ₯νκ² λ§λ€ κ²
**2λΆ**
- λ°(meme)μ μ΄λ¨κΉ?
- "μ°λ¦¬λ μ΄λ―Έμ§λ‘ λλ¬μΈμ¬ μλ€"
- μ¬λλ€μ "μ€λνκ³ , μ 보 μ λ¬νκ³ , κ΅μ‘νλ" μ΄λ―Έμ§λ€
- "μΌκΎΌ κ°μ μ΄λ―Έμ§λ€"
- μ μ©ν μ΄λ―Έμ§ μμ±μ λꡬλ ν μ μλλ‘ νλ κΈ°μ
μ΄ λ°λͺ¨λ ν μ€νΈ ν¬ν¨ μ΄λ―Έμ§μ μ νλ, μμ νν, λ°°κ²½ νλ¦Ό μ²λ¦¬ λ± λ³΅ν©μ 쑰건μ μμ°μ€λ½κ² λ§μ‘±μν¨ λν μ¬λ‘μμ΅λλ€.
π¨ DEMO 2.1. μ΄λ―Έμ§ μνΈμμ©
- μ§μ μ°μ μ¬μ§κ³Ό μνΈμμ©
- μ μΉ΄λ₯Ό μ°μ΄ μ λ‘λ
- βanime frameμΌλ‘ λ°κΏμ€βλΌκ³ μμ²
- GPTβ4oλ μΌκ΅΄, μ λͺ¨μ, νμ , λ°°κ²½ λ±μ κ·Έλλ‘ μ μ§ν μ± anime μ€νμΌλ‘ λ³νν μ΄λ―Έμ§λ₯Ό μμ±
πΌοΈ DEMO 2.2. λ°(meme) μμ± β μ’ λ μμ©νκΈ°
μ΄μ΄μ§λ μμ°μμλ, μ
μΉ΄
β anime λ³ν
β meme μμ±
μ νλ¦μΌλ‘ μ΄μ΄μ§λλ€.
π¬ ν둬ννΈ μμ:
βμ΄κ±Έ λ°μΌλ‘ λ§λ€μ΄μ€. 문ꡬλ βFeel the AGIβ.β
GPTβ4oλ μ΄μ ν둬ννΈμ μ΄λ―Έμ§ 컨ν μ€νΈλ₯Ό μ μ§ν μ±, μ μ ν μ λ¨Έμ ꡬμ±μ κ°μΆ λ° μ΄λ―Έμ§λ₯Ό μμ±νμ΅λλ€.
π DEMO 3. λλ§μ νΈλ μ΄λ© μΉ΄λ λ§λ€κΈ° & κ°μΈ μ°½μ λꡬλ‘μ νμ₯
μ΄λ² λ°λͺ¨μμλ κ°λ°μκ° βμ λ¬Έμ μΈ λμμ΄λκ° λ§λ κ² κ°μ νΈλ μ΄λ© μΉ΄λβλ₯Ό λ§λ€μ΄ 보λ μμ°μ μνν©λλ€.
- μ€μ μΉ΄λ μ¬μ§
- μμ μ κ°μμ§ Sanji μ¬μ§
- μΉ΄λμ λ€μ΄κ° ν μ€νΈ μ 보 (μ΄λ¦, λ₯λ ₯μΉ, μ°λ, λ°°κ²½ μ€μ λ±)
π μ¬μ© ν둬ννΈ(ENG):
1
2
3
4
5
6
7
8
9
10
11
I want to make a trade card for our launch.
I've uploaded an example from the Sora launch, please design a trade card in the same style. The picture on the trade card should be a Shiba Inu snowboarding β please use my dog Sanji from the photo.
A couple details to note:
- Mention "GPT-4o Image" and "2025" in the headline as that's what we are launching today
- Use "Generative AI image model" as its ability. Be sure to mention "GPT-4o" and "native multimodal" in the description
- Sanji weighs "30 lbs", and is "14 inches" tall
- Include these stats on the card, and make them render in a 2x2 grid view
βββ’ "speed": "1 min"
βββ’ "genre": "any"
- Add a fun punch line in the end!
π μ¬μ© ν둬ννΈ(KOR):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
μ°λ¦¬ λ°μΉμ μν **νΈλ μ΄λ© μΉ΄λ(trade card)**λ₯Ό λ§λ€κ³ μΆμ΄.
Sora λ°μΉ λ μ¬μ©λ μμ μ΄λ―Έμ§λ₯Ό μ
λ‘λνμΌλ, λμΌν μ€νμΌλ‘ νΈλ μ΄λ© μΉ΄λλ₯Ό λμμΈν΄μ€.
μΉ΄λμ λ€μ΄κ° μ΄λ―Έμ§λ **μ€λ
Έλ³΄λλ₯Ό νλ μλ°κ²¬(Shiba Inu)**μ΄λ©΄ μ’κ² κ³ ,
λ΄ κ°μμ§ **Sanji**μ μ¬μ§μ νμ©ν΄μ€.
λ€μ μΈλΆ μ¬νλ€μ κΌ λ°μν΄μ€:
- μ λͺ©μλ **"GPT-4o Image"**μ **"2025"**λΌλ λ¬Έκ΅¬κ° κΌ λ€μ΄κ°μΌ ν΄. μ€λ λ°μΉλλ λ΄μ©μ λ°μν κ±°λκΉ.
- λ₯λ ₯(ability)μΌλ‘λ **"Generative AI image model"**μ μ¬μ©ν΄μ€.
ββμ€λͺ
μλ **"GPT-4o"**μ **"native multimodal"**μ΄λΌλ ννμ κΌ λ£μ΄μ€.
- Sanjiλ **λͺΈλ¬΄κ² 30νμ΄λ(30 lbs)**, **ν€λ 14μΈμΉ(14 inches)**μΌ.
- λ€μ μ€ν―(stat)λ€μ μΉ΄λμ ν¬ν¨νκ³ , **2x2 그리λ νν**λ‘ λ³΄μ¬μ€:
βββ’ μλ(speed): "1λΆ"
βββ’ μ₯λ₯΄(genre): "any"
- λ§μ§λ§μ **μ¬λ―Έμλ ν μ€ λ¬Έμ₯(punch line)**λ μΆκ°ν΄μ€!
μ§μ λ§λ€μ΄λ³Έ μ΄μ Mint μΉ΄λ π (μ¬μ§ κ·Έλλ‘λ ꡬνμ΄ μμ§ λΆμ‘±ν λλ)
GPTβ4oλ μ¬μ§ μ€νμΌμ λͺ¨μ¬νλ©΄μλ, μ§μ λ ν μ€νΈμ μ«μ λ°μ΄ν°λ₯Ό μ νν λ°μν νΈλ μ΄λ© μΉ΄λλ₯Ό μμ±νμ΅λλ€.
κ·Έλλ λλΌμ΄ μ μ μΉ΄λμ λ μ΄μμ, μ λ ¬, ν°νΈ μ€νμΌ, μμ κΈμ¨κΉμ§ λ§€μ° μ λ°νκ² ννλμλ€λ κ²μ λλ€.
πͺ DEMO 4. μ΄λ―Έμ§ νΈμ§, ν©μ±, κΈ°λ μ½μΈ μ μ
λ§μ§λ§ μΈμ μμλ GPTβ4oλ₯Ό νμ©ν΄ κΈ°λ μ½μΈ(Memorial Coin)μ λ§λλ κ³Όμ μ μμ°νμ΅λλ€.
- μμ μμ±ν μ΄λ―Έμ§ 4μ₯μ λ°°κ²½ μμλ‘ νμ©
- Hex μμ μ½λλ‘ βλ΄ μ»¬λ¬ νλ νΈβλ₯Ό μ μ©
- κΈ°λ 문ꡬμ λ μ§ μ½μ
- μ΄ν βν¬λͺ λ°°κ²½μΌλ‘ λ§λ€μ΄μ€β μμ² β PNGλ‘ μμ±
μ΄ μμ°μ GPTβ4oκ° λ¨μΌ μ΄λ―Έμ§ μμ±μ 머무λ₯΄μ§ μκ³ , λ©ν°ν΄(Multi-turn)
μνΈμμ©μ ν΅ν΄ μ΄λ―Έμ§ νΈμ§ λ° λ°λ³΅μ 컀μ€ν°λ§μ΄μ§κΉμ§ κ°λ₯νλ€λ κ²μ 보μ¬μ£Όμμ΅λλ€.
μλ μ§λ¬Έλ€λ μΆκ°μ μΌλ‘ μ§μν΄μ μ΄λ―Έμ§λ₯Ό μνλ λ°©ν₯μΌλ‘ μ λ°μ΄νΈν΄ λ³Ό μλ μκ² κ΅°μ!!
- βλ·λ©΄ λμμΈλ λ§λ€μ΄μ€β
- βμ΄λ¦λ§λ€ μμ λ€λ₯΄κ² μ μ©ν΄μ€β
- βμ΄ λΆλΆλ§ μμ ν΄μ€β
PART 2. OpenAI Blog λ΄μ© μκ°
κΈμ 맨 μμμ μκΈ°ν κ²μ²λΌ μ΄λ² GPT-4o ImageGeneration μ λ°μ΄νΈλ μλμ κ°μ΅λλ€.
βΉοΈ μ κ· κΈ°λ₯ μμ½
- π§ λ©ν°ν΄ μ΄λ―Έμ§ μμ± (Multi-turn Generation)
- π μ λ°ν μΈμ€νΈλμ λ°μ
- π§ μΈμ»¨ν μ€νΈ νμ΅ (In-context Learning)
- π μλ μ§μ μ°κ²° (World Knowledge)
- βοΈ μ νν ν μ€νΈ λ λλ§ (Text Rendering)
- π λ€μν μ€νμΌ & ν¬ν 리μΌλ¦¬μ¦
π μ΄λ² μ±ν°μμλ κ°κ°μ κΈ°λ₯μ λν΄μ μ’ λ μ΄ν΄λ³΄λλ‘ νκ² μ΅λλ€.
μλλ GPT-4oμ μ΄λ―Έμ§ μμ± κΈ°λ₯μ 6κ°μ§ ν΅μ¬ κΈ°λ₯λ³λ‘ μ 리νκ³ , κ° κΈ°λ₯μ λ§λ μμ Promptλ₯Ό μμΈνκ² μΆκ°ν λ΄μ©μ λλ€. μ€λ¬΄ νμ©μ μν ν둬ννΈ μμ±μ μ μ©νλλ‘ κ΅¬μ±νμμ΅λλ€.
μλ μμλ€μ ν둬ννΈμ μ΄λ―Έμ§λ OpenAI Blogμ λμ¨ κ³΅μ μ¬λ‘λ€μ λλ€.
(μ‘°κΈμ© μΆκ°/λ³κ²½ν λ΄μ©λ μ‘΄μ¬ν©λλ€)
π§ 1. Multi-turn Generation (λ©ν°ν΄ μ΄λ―Έμ§ μμ±)
GPT-4oλ μ±ν κΈ°λ°μΌλ‘ μ΄λ―Έμ§ μμ± κ³Όμ μ λ¨κ³μ μΌλ‘ μ΄μ΄κ°λ©° μ μ§μ μΌλ‘ μ κ΅νν μ μμ΅λλ€.
π μ£Όμ κΈ°λ₯
- λνν λ°©μμΌλ‘ μ΄λ―Έμ§ μμ κ°λ₯
- μΌκ΄μ± μ μ§ (μΊλ¦ν° μ€νμΌ, λ°°κ²½ μμ, UI λ±)
- 볡μ‘ν μ₯λ©΄ ꡬμ±μ λ¨κ³μ μΌλ‘ λ°μ
π¨ μμ Prompt
CAT (Image)
1
β Give this cat a detective hat and a monocle
β κΈ°λ³Έ μ΄λ―Έμ§μ νμ λͺ¨μμ λ¨μκ²½ μΆκ°
1
β turn this into a triple A video games made with a 4k game engine and add some User interface as overlay from a mystery RPG where we can see a health bar and a minimap at the top as well as spells at the bottom with consistent and iconography
β AAAκΈ κ²μ μ€νμΌλ‘ λ³ν, UI μ€λ²λ μ΄ μΆκ°
β κ²μ HUD
, μΌκ΄λ μμ΄μ½
, ν΄μλ/μ€νμΌ
μ
κ·Έλ μ΄λ
1
β update to a landscape image 16:9 ratio, add more spells in the UI, and unzoom the visual so that we see the cat in a third person view walking through a steampunk manhattan creating beautiful contrast and lighting like in the best triple A game, with cool-toned colors
β νλ©΄ λΉμ¨ λ³κ²½(16:9), μμ λ³κ²½(3μΈμΉ), λ°°κ²½ μ€μ μΆκ°
β μ€ννν¬ λ§¨ν΄νΌ λ°°κ²½, 컬λ¬ν€ μ‘°μ , μ‘°λͺ ν¨κ³Ό λ± κ³ κΈ μ‘°μ
π§© μΆκ° μμ
μΆκ°μ μΌλ‘ μλ ν둬ννΈλ₯Ό μ μ©νμ¬ κ³ μμ΄μκ² λ² νΈλ§¨ λ§μ€ν¬λ₯Ό μμ보μμ΅λλ€.
1
β turn this cat's detective hat and a monocle to batman mask.
β μμ λ³κ²½μΌλ‘ μλ‘μ΄ μΊλ¦ν° μ€νμΌ μμ±
π 2. Instruction Following (μ§μμ¬ν κΈ°λ° μμ±)
GPT-4oλ κΈ΄ promptμ 볡μ‘ν κ°μ²΄ μ§μμ¬νμ μ νν λ°λ¦ λλ€.
(μ΅λ 10~20κ°μ κ°μ²΄λ μ²λ¦¬ κ°λ₯.)
π μμ Prompt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
A square image containing a 4 row by 4 column grid containing 16 objects on a white background.
Go from left to right, top to bottom. Here's the list:
1. a blue star
2. red triangle
3. green square
4. pink circle
5. orange hourglass
6. purple infinity sign
7. black and white polka dot bowtie
8. tiedye "42"
9. an orange cat wearing a black baseball cap
10. a map with a treasure chest
11. a pair of googly eyes
12. a thumbs up emoji
13. a pair of scissors
14. a blue and white giraffe
15. the word "OpenAI" written in cursive
16. a rainbow-colored lightning bolt
β 4x4 μ μ¬κ° 격μ, μ’βμ°, μβν μμ, κ°κ°μ μμ±μ΄ μ νν λ°μλ¨
π§© μΆκ° μμ
μ¬μΈν μμ² μ¬νκΉμ§ μνν΄μ£Όλ GPTβ4oβs image generation.
1
show me a wine glass with only the tiniest drop of red wine in it.
β λ§€μ° μ λ°ν λν μΌλ ꡬν κ°λ₯.
β βtiniest dropβμ΄λΌλ μλμ ννλ μκ°νλ¨.
π§ 3. In-context Learning (컨ν μ€νΈ νμ΅)
μ°Έμ‘° μ΄λ―Έμ§ κΈ°λ°μΌλ‘ μ€νμΌ, ꡬμ±, μμ΄λμ΄λ₯Ό νμ΅ν΄ μλ‘μ΄ μ΄λ―Έμ§μ λ°μν©λλ€.
πΌ μμ Prompt
1
2
3
Draw a design for a vehicle with triangular wheels using these images as references.
Label the front and back wheels, and at the bottom write:
βTRIANGLE WHEELED VEHICLE. English Patent. 2025. OPENAI.β (in small caps)
β μ μ‘λ μ°Έμ‘° μ΄λ―Έμ§λ₯Ό λ°νμΌλ‘ μ°¨λ νν ꡬμ±
β βTRIANGLE WHEELED VEHICLE. English Patent. 2025. OPENAI.β ν μ€νΈ ν¬ν¨
π§© μΆκ° μμ
1
an photorealistic image of a blue chainsaw
β λ¨μ μμ²μλ ν¬ν 리μΌλ¦¬μ¦μ κΈ°λ°μΌλ‘ λ§€μ° μΈλΆμ μ΄κ³ μ¬μ€μ μΈ λν μΌ κ΅¬ν
π 4. World Knowledge Integration (μΈκ³ μ§μ μ°κ³)
GPT-4oλ ν μ€νΈ λͺ¨λΈμ μ§μμ κΈ°λ°μΌλ‘ μ΄λ―Έμ§λ₯Ό λ Όλ¦¬μ μΌλ‘ μμ±ν©λλ€.
πΉ μμ Prompt
1
2
3
4
5
6
7
8
9
10
Make me a professionally shot photorealistic diagram of the top selling cocktails in my bar with recipes labeled on each drink.
put the recipes on handwritten cards in front of each drink.
the cards are brown, and the text is black.
background is white
Title is "4 most popular cocktails"
β μΉ΅ν μΌ κ΅¬μ±, λ μνΌ, μ¬μ§ 촬μ μ€νμΌ λ±μ ν΅ν©ν μ΄λ―Έμ§ μμ±
β μκΈμ¨ μΉ΄λ, μλ£ μμ£Ό λ°°μ΄, λ°°κ²½ λ± λ³΅ν© μ‘°κ±΄ λ§μ‘±
π§© μΆκ° μμ
1
make a visual infographic describing why SF is so foggy
β μ§μ κΈ°λ° κΈ°ν μ§μκ³Ό μκ° μλ£νκΉμ§ μ§ν (Fog = ν΄μμ± κΈ°ν, μ§ν μν₯ λ± μκ°ν)
πΌ 5. Photorealism & Style Transfer (ν¬ν 리μΌλ¦¬μ¦ λ° μ€νμΌ λ³ν)
λ€μν μ€νμΌμ λν νμ΅μ ν΅ν΄ μ¬μ€μ μ΄κ±°λ μμ μ μΈ μ΄λ―Έμ§ μμ±μ΄ κ°λ₯
π¨βπ« μμ Prompt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.
The text reads:
(left)
"Transfer between Modalities:
Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.
Pros:
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack
Cons:
* varying bit-rate across modalities
* compute not adaptive"
(Right)
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"
On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"
β νΉμ 촬μ κ°λ, μκΈμ¨μ μμ°μ€λ¬μ, λ°°κ²½ μ°½λ¬Έ/λ°μ¬ ν¨κ³ΌκΉμ§ μ λ° μ¬ν
π§© μΆκ° μμ
1
2
A cat looking into a puddle of water on a street.
The reflection is that of a tiger, realistically distorted by ripples in the water.
β λ©νν¬μ ννκΉμ§ μ κ΅νκ² κ΅¬ν
β λ¬Όκ²° λ°μ¬ ν¨κ³ΌκΉμ§ ν¬ν 리μΌνκ² λ¬μ¬
π 6. Text Rendering (ν μ€νΈ λ λλ§ μ νλ)
μ΄λ―Έμ§ λ΄ ν μ€νΈ λ λλ§ νμ§μ΄ νμνλ©°, νμ§ν, λ©λ΄ν, μ΄λμ₯ λ± μ€μ¬μ© κ°λ₯
π§Ύ μμ Prompt
1
2
3
4
5
6
7
8
9
10
Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign.
Context:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24β25)\n Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.
Characters:
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.
Composition from background to foreground:
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot
β 볡μ‘ν κ°ν κ΅¬μ± λ° μ€μ ν μ€νΈ λ λλ§ μλ²½ λ°μ
β βBroom Parking for Witchesβ¦β κ°μ μ λ¨Έ μμκΉμ§ νμ€μ νμ§ν νμμΌλ‘ νν
π§© μΆκ° μμ
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
I'm opening a traditional concept restaurant in Marin called Haein. It focuses on Korean food cooked with organic, farm-fresh ingredients, with a rotating menu based on what's seasonal. I want you to design an image - a menu incorporating the following menu items - lean into the traditional/rustic style while keeping it feeling upscale and sleek. Please also include illustrations of each dish in an elegant, peter rabbit style. Make sure all the text is rendered correctly, with a white background.
(Top)
Doenjang Jjigae (Fermented Soybean Stew) β $18 House-made doenjang with local mushrooms, tofu, and seasonal vegetables served with rice.
Galbi Jjim (Braised Short Ribs) β $34 Slow-braised local grass-fed beef ribs with pear and black garlic glaze, seasonal root vegetables, and jujube.
Grilled Seasonal Fish β Market Price ($22-$30) Whole or fillet of local, sustainable fish grilled over charcoal, served with perilla leaf ssam and house-made sauces.
Bibimbap β $19 Heirloom rice with a rotating selection of farm-fresh vegetables, house-fermented gochujang, and pasture-raised egg.
Bossam (Heritage Pork Wraps) β $28 Slow-cooked pork belly with napa cabbage wraps, oyster kimchi, perilla, and seasonal condiments.
(Bottom) Dessert & Drinks Seasonal Makgeolli (Rice Wine) β $12/glass
Rotating flavors based on seasonal fruits and flowers (persimmon, citrus, elderflower, etc.).
Hoddeok (Korean Sweet Pancake) β $9 Pan-fried cinnamon-stuffed pancake with black sesame ice cream.
β κ³ κΈ μ ν΅ μ€νμΌμ νμ λ©λ΄ν μκ°ν
β Peter Rabbit μ€νμΌ μΌλ¬μ€νΈ ν¬ν¨, ν μ€νΈ μ€ν μμ
λ§λ¬΄λ¦¬
μ΄λ² GPT-4o μ λ°μ΄νΈλ₯Ό ν΅ν΄ μ°λ¦¬λ μ΄λ―Έμ§ μμ±μμλ λ¨μν μμ±μ λμ΄, ν둬ννΈλ₯Ό ν΅ν μΈλ°ν 컨νΈλ‘€κ³Ό λ€λ¨κ³ μ κ΅ν, κ·Έλ¦¬κ³ μ§μ κΈ°λ° μμ±κΉμ§ νλμ κ°λ₯μ±μ νμΈν μ μμμ΅λλ€.
ν둬ννΈ νλμλ μΈμ¬νκ² μλλ₯Ό λ΄λλ€λ©΄, ν¨μ¬ λ κ³ κΈμ€λ½κ³ μ°½μμ μΈ μ΄λ―Έμ§λ₯Ό λ§λ€μ΄λΌ μ μμμ 보μ¬μ£Όλ μΈμμ μΈ μ¬λ‘λ€μ΄μμ΅λλ€.
OpenAIμ CEO μ μνΈλ§μ μ΄λ² μ λ°μ΄νΈκ° μλλ λ€μ λλ €μ‘μ§λ§, κ·Έλ§νΌ μ λ°λμ ννλ ₯ λ©΄μμλ ν° μ§λ³΄κ° μμλ€κ³ κ°μ‘°νμ΅λλ€. μ±λ₯ μ΅μ νλ μ μ°¨ κ°μ λ μμ μ΄λ μμΌλ‘μ μ λ°μ΄νΈλ κΈ°λν΄λ³Ό λ§ν©λλ€.
ννΈ, μ§λΈλ¦¬ μ€νμΌ λ± μ μκΆκ³Ό μ΄λ―Έμ§ μ€νμΌ μ¬μ©μ λν μ΄μλ μ μ°¨ λλλκ³ μλλ°μ, OpenAIμ μ°½μμ κ°μ μ€λ¦¬μ μ΄κ³ λ²μ μΈ μ‘°μ¨ λ°©ν₯λ μ€μν κ΄μ ν¬μΈνΈκ° λ κ²μ λλ€.
π¬ μ¬λ¬λΆμ μ΄λ€ μ΄λ―Έμ§λ₯Ό λ§λ€μ΄λ³΄κ³ μΆμΌμ κ°μ?
π¨ ν둬ννΈμ μ΄λ€ μμμ λ΄μλ³΄μ ¨λμ?
μ§μ μ€νν΄λ³΄κ³ , λλ§μ μνμ λ§λ€μ΄λ³΄λ κ²λ ν° μ¬λ―Έμ λλ€.
μ¬λ―Έμλ μλλ κ²°κ³Όλ¬Όμ΄ μλ€λ©΄ λκΈμ΄λ λ§ν¬λ‘ ν¨κ» 곡μ ν΄μ£ΌμΈμ! π¨
μ€λλ μ½μ΄μ£Όμ μ κ°μ¬ν©λλ€!π