Skip to content

LLaMA: Open and Efficient Foundation Language Models

Overview

Note: This section is under heavy development.

What's New?

This paper introduces a collection of foundation language models ranging from 7B to 65B parameters.

Training Data

The models are trained on trillions of tokens with publicly available datasets.

Research Context

The work by (Hoffman et al. 2022) shows that given a compute budget, smaller models trained on a lot more data can achieve better performance than larger counterparts. This work recommends training 10B models on 200B tokens.

Key Finding: However, the LLaMA paper finds that the performance of a 7B model continues to improve even after 1T tokens.

Research Focus

LLaMA Research Overview

This work focuses on training models (LLaMA) that achieve the best possible performance at various inference budgets, by training on more tokens.

Capabilities & Key Results

Performance Comparison

LLaMA-13B outperforms GPT-3(175B) on many benchmarks despite being:

  • 10x smaller
  • Possible to run on a single GPU

LLaMA 65B is competitive with models like:

  • Chinchilla-70B
  • PaLM-540B

Resources

Paper

LLaMA: Open and Efficient Foundation Language Models

Code

GitHub Repository

References

April 2023

March 2023

Key Takeaways

  1. Efficient Scaling: Smaller models trained on more data can outperform larger counterparts
  2. Continuous Improvement: 7B model performance improves even after 1T tokens
  3. Cost-Effective: LLaMA-13B outperforms GPT-3(175B) while being 10x smaller
  4. Single GPU Compatible: LLaMA-13B can run on a single GPU
  5. Competitive Performance: LLaMA 65B competes with state-of-the-art models
  6. Open Foundation: Provides base for many derivative models and applications