Skip to content

Draw a Person Using Alphabet Letters

Background

The following prompt tests an LLM's capabilities to handle visual concepts, despite being trained only on text. This is a challenging task for the LLM so it involves several iterations. In the example below the user first requests for a desired visual and then provides feedback along with corrections and additions. The follow up instructions will depend on the progress the LLM makes on the task.

Note: This task is asking to generate TikZ code which will then need to be manually compiled by the user.

Prompt Iterations

Prompt Iteration 1: Initial Request

User Request:

Produce TikZ code that draws a person composed from letters in the alphabet. The arms and torso can be the letter Y, the face can be the letter O (add some facial features) and the legs can be the legs of the letter H. Feel free to add other features.

Prompt Iteration 2: Refinement

User Feedback:

The torso is a bit too long, the arms are too short and it looks like the right arm is carrying the face instead of the face being right above the torso. Could you correct this please?

Prompt Iteration 3: Adding Details

User Request:

Please add a shirt and pants.

Implementation

Code Example

python
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {
            "role": "user",
            "content": "Produce TikZ code that draws a person composed from letters in the alphabet. The arms and torso can be the letter Y, the face can be the letter O (add some facial features) and the legs can be the legs of the letter H. Feel free to add other features."
        }
    ],
    temperature=1,
    max_tokens=1000,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0
)

API Parameters

ParameterValueDescription
modelgpt-4The language model to use
temperature1Controls randomness in generation
max_tokens1000Maximum length of the response
top_p1Nucleus sampling parameter
frequency_penalty0No penalty for frequent tokens
presence_penalty0No penalty for new tokens

Key Learning Points

  1. Iterative Refinement: The process involves multiple iterations with user feedback
  2. Visual Concept Understanding: Testing LLM's ability to understand spatial relationships
  3. Code Generation: Creating specific code (TikZ) for visual output
  4. Constraint-Based Generation: Using specific requirements to guide the output

Expected Output

The LLM should generate TikZ code that creates a visual representation of a person using alphabet letters:

  • Torso and Arms: Formed by the letter Y
  • Face: Formed by the letter O with facial features
  • Legs: Formed by the letter H
  • Additional Features: Shirt, pants, and other details as requested

Reference

Source: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (13 April 2023)