Skip to content

OLMo

Overview

In this guide, we provide an overview of the Open Language Model (OLMo), including prompts and usage examples. The guide also includes tips, applications, limitations, papers, and additional reading materials related to OLMo.

Introduction to OLMo

The Allen Institute of AI has released a new open language model and framework called OLMo. This effort is meant to provide full access to data, training code, models, and evaluation code to accelerate the study of language models collectively.

Model Variants

Their first release includes:

  • Four variants at the 7B parameter scale
  • One model at the 1B scale
  • All trained on at least 2T tokens

Future: This marks the first of many releases, including an upcoming 65B OLMo model.

OLMo Models

Complete Release Package

The releases include:

  • Full training data, including the code that produces the data
  • Full model weights, training code, logs, metrics, and inference code
  • Several checkpoints per model
  • Evaluation code
  • Fine-tuning code

Licensing

All the code, weights, and intermediate checkpoints are released under the Apache 2.0 License.

OLMo-7B

Architecture

Both the OLMo-7B and OLMo-1B models adopt a decoder-only transformer architecture with improvements from other models like PaLM and Llama:

  • No biases
  • Non-parametric layer norm
  • SwiGLU activation function
  • Rotary positional embeddings (RoPE)
  • Vocabulary: 50,280 tokens

Dolma Dataset

Dataset Overview

This release also includes a pre-training dataset called Dolma -- a diverse, multi-source corpus of 3 trillion tokens across 5B documents acquired from 7 different data sources.

Dolma Dataset

Data Processing

The creation of Dolma involves several steps:

  • Language filtering
  • Quality filtering
  • Content filtering
  • Deduplication
  • Multi-source mixing
  • Tokenization

Training Details

The training dataset includes a 2T-token sample from Dolma:

  • Tokens are concatenated together after appending a special EOS token to the end of each document
  • Training instances include groups of consecutive chunks of 2048 tokens
  • Chunks are shuffled during training

Note: More training details and hardware specifications can be found in the paper.

Results

Evaluation Framework

The models are evaluated on downstream tasks using Catwalk. OLMo models are compared to several publicly available models like Falcon and Llama 2.

Evaluation Tasks

The model is evaluated on tasks that measure commonsense reasoning abilities:

  • Datasets: piqa and hellaswag
  • Method: Zero-shot evaluation using rank classification (completions ranked by likelihood)
  • Metric: Accuracy

Performance Summary

OLMo-7B:

  • Outperforms all other models on 2 end-tasks
  • Remains top-3 on 8/9 end-tasks

OLMo Results

Prompting Guide for OLMo

Coming soon...

Figures source: OLMo: Accelerating the Science of Language Models

Key Takeaways

  1. Full Openness: Complete access to data, code, weights, and checkpoints
  2. Multiple Scales: 1B, 7B, and upcoming 65B parameter variants
  3. Advanced Architecture: Decoder-only transformer with modern improvements
  4. Quality Dataset: 3T token Dolma corpus from diverse sources
  5. Strong Performance: Top-3 performance on 8/9 commonsense reasoning tasks
  6. Research Focus: Designed to accelerate language model science
  7. Apache 2.0 License: Fully open source for research and development

References