Skip to content

Mixtral 8x22B

Overview

Mixtral 8x22B is a new open large language model (LLM) released by Mistral AI. Mixtral 8x22B is characterized as a sparse mixture-of-experts model with 39B active parameters out of a total of 141B parameters.

Capabilities

Mixtral 8x22B is trained to be a cost-efficient model with capabilities that include:

  • Multilingual understanding
  • Math reasoning
  • Code generation
  • Native function calling support
  • Constrained output support

The model supports a context window size of 64K tokens which enables high-performing information recall on large documents.

Mistral AI claims that Mixtral 8x22B delivers one of the best performance-to-cost ratio community models and it is significantly fast due to its sparse activations.

Mixtral 8x22B PerformanceSource: Mistral AI Blog

Results

Performance on Reasoning and Knowledge Benchmarks

According to the official reported results, Mixtral 8x22B (with 39B active parameters) outperforms state-of-the-art open models like:

  • Command R+
  • Llama 2 70B

on several reasoning and knowledge benchmarks like:

  • MMLU
  • HellaS
  • TriQA
  • NaturalQA

and many others.

Mixtral 8x22B Reasoning and Knowledge PerformanceSource: Mistral AI Blog

Performance on Coding and Math Tasks

Mixtral 8x22B outperforms all open models on coding and math tasks when evaluated on benchmarks such as:

  • GSM8K
  • HumanEval
  • Math

It's reported that Mixtral 8x22B Instruct achieves a score of 90% on GSM8K (maj@8).

Mixtral 8x22B Reasoning and Knowledge PerformanceSource: Mistral AI Blog

Usage

More information on Mixtral 8x22B and how to use it can be found in the official Mistral AI documentation.

License

The model is released under an Apache 2.0 license.

Key Takeaways

  1. Sparse mixture-of-experts model with 39B active parameters
  2. 64K token context window
  3. Superior performance on reasoning and knowledge benchmarks
  4. Excellent on coding and math tasks (90% on GSM8K)
  5. Open license Apache 2.0
  6. Optimal performance-to-cost ratio for community models