Draw a Person Using Alphabet Letters
Background
The following prompt tests an LLM's capabilities to handle visual concepts, despite being trained only on text. This is a challenging task for the LLM so it involves several iterations. In the example below the user first requests for a desired visual and then provides feedback along with corrections and additions. The follow up instructions will depend on the progress the LLM makes on the task.
Note: This task is asking to generate TikZ code which will then need to be manually compiled by the user.
Prompt Iterations
Prompt Iteration 1: Initial Request
User Request:
Produce TikZ code that draws a person composed from letters in the alphabet. The arms and torso can be the letter Y, the face can be the letter O (add some facial features) and the legs can be the legs of the letter H. Feel free to add other features.
Prompt Iteration 2: Refinement
User Feedback:
The torso is a bit too long, the arms are too short and it looks like the right arm is carrying the face instead of the face being right above the torso. Could you correct this please?
Prompt Iteration 3: Adding Details
User Request:
Please add a shirt and pants.
Implementation
Code Example
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "user",
"content": "Produce TikZ code that draws a person composed from letters in the alphabet. The arms and torso can be the letter Y, the face can be the letter O (add some facial features) and the legs can be the legs of the letter H. Feel free to add other features."
}
],
temperature=1,
max_tokens=1000,
top_p=1,
frequency_penalty=0,
presence_penalty=0
)API Parameters
| Parameter | Value | Description |
|---|---|---|
model | gpt-4 | The language model to use |
temperature | 1 | Controls randomness in generation |
max_tokens | 1000 | Maximum length of the response |
top_p | 1 | Nucleus sampling parameter |
frequency_penalty | 0 | No penalty for frequent tokens |
presence_penalty | 0 | No penalty for new tokens |
Key Learning Points
- Iterative Refinement: The process involves multiple iterations with user feedback
- Visual Concept Understanding: Testing LLM's ability to understand spatial relationships
- Code Generation: Creating specific code (TikZ) for visual output
- Constraint-Based Generation: Using specific requirements to guide the output
Expected Output
The LLM should generate TikZ code that creates a visual representation of a person using alphabet letters:
- Torso and Arms: Formed by the letter Y
- Face: Formed by the letter O with facial features
- Legs: Formed by the letter H
- Additional Features: Shirt, pants, and other details as requested
Reference
Source: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (13 April 2023)
