Gemma

Overview

Google DeepMind releases Gemma, a series of open language models inspired by the same research and technology used to create Gemini. The Gemma model release includes:

2B model: Trained on 2T tokens
7B model: Trained on 6T tokens
Base and instruction-tuned checkpoints
Context length: 8192 tokens
Performance: Generally outperforms Llama 2 7B and Mistral 7B on several benchmarks

Architecture

The Gemma model architecture is based on the transformer decoder with improvements including:

Multi-query attention (used by the 2B model)
Multi-head attention (used by 7B model)
RoPE embeddings
GeGLU activations
Normalizer location

Training Data

Gemma 2B: Trained on 2T tokens
Gemma 7B: Trained on 6T tokens
Content: Web documents, mathematics, and code
Language: English only (no explicit multilingual training)
Multimodal: No explicit multimodal capabilities

Technical Specifications

Vocabulary size: 256K tokens
Tokenization: Subset of SentencePiece tokenizer from Gemini
Features: Preserves whitespace, splits digits, byte-level encodings for unknown tokens

Instruction Tuning

The instruction-tuned models are tuned using:

Supervised fine-tuning on a mix of text-only synthetic and human-generated prompt-response pairs
Reinforcement learning from human feedback (RLHF) with:
- Reward model trained on labeled preference data
- Policy based on a set of high-quality prompts

Note: All datasets used are English only.

Control Tokens

The instruction-tuned models use specific formatting control tokens to indicate roles and turns in a conversation:

Gemma Control Tokens

Results

Performance Capabilities

As shown in the figure below, the Gemma 7B model demonstrates strong performance on math, science, and code-related tasks. The scores correspond to the average scores on academic benchmark evaluations grouped by capability.

Gemma Capabilities

Benchmark Comparison

Gemma 7B outperforms Llama 2 7B and Mistral 7B on various academic benchmarks with notable performance on:

HumanEval
GSM8K
MATH
AGIEval

Improved performance on:

Reasoning
Dialogue
Mathematics
Code

Safety Evaluation

The Gemma 7B instruction-tuned models also outperform the Mistral-7B v0.2 Instruct model on safety and instruction following as evaluated by humans.

Gemma Safety

Gemma is evaluated on several safety academic benchmarks and compared with Mistral. The technical report mentions the use of:

Debiasing techniques
Red-teaming to potentially mitigate common risks associated with LLMs

Resources: You can find more information on how to responsibly develop with Gemma in the model card and Responsible Generative AI toolkit.

Prompt Format

Base Models

The Gemma base models don't use any specific prompt format but can be prompted to perform tasks through zero-shot/few-shot prompting.

Instruction Model Format

The Gemma Instruct model uses the following format:

<start_of_turn>user
Generate a Python function that multiplies two numbers <end_of_turn>
<start_of_turn>model

Control Tokens Reference

Context	Relevant Token
User turn	`user`
Model turn	`model`
Start of conversation turn	`<start_of_turn>`
End of conversation turn	`<end_of_turn>`

Multi-turn Example

You can also use the special control tokens in the context of a multi-turn user prompt:

<start_of_turn>user
What is a good place for travel in the US?<end_of_turn>
<start_of_turn>model
California.<end_of_turn>
<start_of_turn>user
What can I do in California?<end_of_turn>
<start_of_turn>model

How to Prompt Gemma 7B

Prompting Gemma 7B effectively requires being able to use the prompt template properly. Here are examples demonstrating effective use of the Gemma 7B Instruct prompt template for various tasks.

Zero-shot Prompting

As with any model, you can leverage Gemma's zero-shot capabilities:

<start_of_turn>user
Explain why the sky is blue<end_of_turn>
<start_of_turn>model

Zero-shot Prompting with System Prompt

Adding a system role or system prompt helps to steer LLMs better. While there is no explicit system role in Gemma, you can add additional instructions:

<start_of_turn>user
Answer the following question in a concise and informative manner:
 
Explain why the sky is blue<end_of_turn>
<start_of_turn>model

Note: We added "Answer the following question in a concise and informative manner:" as additional instruction to steer the model better.

Role Playing

We can test Gemma for role-playing capabilities, useful for personalizing model responses and building complex domain-specific LLM agents.

Prompt:

<start_of_turn>user
You are a helpful 2nd-grade teacher. Help a 2nd grader to answer questions in a short and clear manner.
Explain why the sky is blue<end_of_turn>
<start_of_turn>model

Model Response:

Sure, here is an explanation for why the sky is blue:
The sky appears blue because of a phenomenon called Rayleigh Scattering.
Here's the science behind it:
Sunlight is made up of different colors of light, like red, green, blue, and violet.
When sunlight hits the Earth's atmosphere, the different colors of light scatter in different directions.
Blue light scatters more easily than other colors because it has shorter wavelengths.
The scattered blue light travels in all directions, including towards your eyes.
That's why we see the sky as blue.
It's important to note that this explanation is simplified and does not account for all scientific details. However, it gives you a good understanding of the basic science behind why the sky appears blue.

Note: We haven't checked the response for accuracy or suitability for a 2nd grader. You can compare this with the chain-of-thought prompt response that follows.

Reasoning

You can elicit Gemma's reasoning capabilities using zero-shot chain-of-thought prompting:

Prompt:

<start_of_turn>user
Think and write your step-by-step reasoning before responding.
 
Explain why the sky is blue.<end_of_turn>
<start_of_turn>model

Model Response (truncated for brevity):

Step 1: Scattering of Light The sky appears blue due to the phenomenon of scattering of light. When sunlight hits the Earth's atmosphere, it interacts with the particles of air, primarily nitrogen and oxygen molecules.
Step 2: Rayleigh Scattering Rayleigh scattering is the specific scattering mechanism that occurs when light interacts with particles much smaller than the wavelength of light. In this process, shorter wavelengths (blue light) are scattered more effectively than longer wavelengths (red light).
Step 3: Blue Scattered Light Travels in All Directions ...
Conclusion The sky appears blue due to the scattering of light, specifically Rayleigh scattering. Shorter wavelengths of light are scattered more effectively, and this scattered light travels in all directions, with the scattered light that travels opposite to the sun's rays being most visible to our eyes.

Key Features:

Gemma can output markdown format automatically
We've edited the text format for demonstration and simplicity
Content is exactly what the model responded with
We haven't assessed accuracy or potential hallucination

Resources and Integrations

Here are several resources and integrations that were part of the Gemma release:

Colab and Kaggle notebooks
Hugging Face models
MaxText
NVIDIA NeMo
TensorRT-LLM
NVIDIA AI Playground (Gemma 7B available)

Commercial Usage

According to the official blog release, the Terms of Use permit responsible commercial usage and distribution for all organizations, regardless of size.

Key Takeaways

Open Source: Fully open language models based on Gemini research
Strong Performance: Outperforms comparable models on academic benchmarks
Safety Focus: Comprehensive safety evaluation and debiasing techniques
Flexible Prompting: Support for zero-shot, few-shot, and role-playing
Reasoning Capabilities: Chain-of-thought prompting for complex reasoning
Commercial Ready: Permits responsible commercial usage
Multiple Integrations: Available across major AI platforms

Adversarial prompting

Coding

Creativity

Evaluation

LLMs for classification

Image generation

Information extraction

LLM research findings

Mathematics

Models

Question answering

Reasoning

Risks & Misuses

Text summarizations

Truthfulness

Gemma

Overview

Architecture

Training Data

Technical Specifications

Instruction Tuning

Control Tokens

Results

Performance Capabilities

Benchmark Comparison

Safety Evaluation

Prompt Format

Base Models

Instruction Model Format

Control Tokens Reference

Multi-turn Example

How to Prompt Gemma 7B

Zero-shot Prompting

Zero-shot Prompting with System Prompt

Role Playing

Reasoning

Resources and Integrations

Commercial Usage

Key Takeaways

References

Gemma ​

Overview ​

Architecture ​

Training Data ​

Technical Specifications ​

Instruction Tuning ​

Control Tokens ​

Results ​

Performance Capabilities ​

Benchmark Comparison ​

Safety Evaluation ​

Prompt Format ​

Base Models ​

Instruction Model Format ​

Control Tokens Reference ​

Multi-turn Example ​

How to Prompt Gemma 7B ​

Zero-shot Prompting ​

Zero-shot Prompting with System Prompt ​

Role Playing ​

Reasoning ​

Resources and Integrations ​

Commercial Usage ​

Key Takeaways ​

References ​

Related Topics ​

Gemma

Overview

Architecture

Training Data

Technical Specifications

Instruction Tuning

Control Tokens

Results

Performance Capabilities

Benchmark Comparison

Safety Evaluation

Prompt Format

Base Models

Instruction Model Format

Control Tokens Reference

Multi-turn Example

How to Prompt Gemma 7B

Zero-shot Prompting

Zero-shot Prompting with System Prompt

Role Playing

Reasoning

Resources and Integrations

Commercial Usage

Key Takeaways

References

Related Topics