Join Our Discord (750+ Members)

Mixtral 8x7b - Mistral AI

Mixtral 8x7B: High-quality sparse model, 6x faster than Llama 2 70B. Best cost/performance, multilingual, fine-tunable. Outperforms GPT3.5.

Model Overview

  • High-quality sparse mixture-of-experts model
  • Licensed under Apache 2.0
  • Outperforms Llama 2 70B with 6x faster inference
  • Best cost/performance trade-offs, matching or surpassing GPT3.5


  • Handles a context of 32k tokens
  • Supports English, French, Italian, German, and Spanish
  • Strong performance in code generation
  • Fine-tunable for instruction-following tasks, achieving 8.3 on MT-Bench

Sparse Architectures

  • Sparse mixture-of-experts network
  • Decoder-only model with 8 distinct parameter groups
  • Router network chooses experts for token processing
  • 46.7B total parameters, uses only 12.9B parameters per token

Performance Comparison

  • Outperforms Llama 2 70B and GPT3.5 on most benchmarks
  • Efficient models compared to Llama 2 family
  • Detailed results provided for performance overview

Bias and Language

  • Less bias on BBQ benchmark compared to Llama 2
  • Displays positive sentiments on BOLD with similar variances

Instructed Models

  • Releases Mixtral 8x7B Instruct optimized for instruction-following
  • Reaches a score of 8.30 on MT-Bench, comparable to GPT3.5

Open-Source Deployment

  • Submitted changes to vLLM project for open-source deployment
  • Skypilot enables vLLM endpoints deployment on any cloud instance

Platform Usage

  • Mixtral 8x7B available on the mistral-small endpoint in beta
  • Early access registration for generative and embedding endpoints


  • Thanks to CoreWeave and Scaleway teams for technical support during model training.

Follow AI Models on Google News

An easy & free way to support AI Models is to follow our google news feed! More followers will help us reach a wider audience!

Google News: AI Models

Related Posts



Introducing AI SCARLXRD’s innovative collection of songs, made using VITS Retrieval based Voice Conversion methods. RVC Model AI Voice RVC Model AI Voice

Introducing AI’s newest collection of songs! Made with VITS Retrieval based Voice Conversion methods from talented ai enthusiasts.

English female TTS Model glow tts Encoding Trained on ljspeech Dataset at 22050Hz

English female TTS Model glow tts Encoding Trained on ljspeech Dataset at 22050Hz

English female text-to-speech model trained on the ljspeech dataset at 22050 Hz and is available to synthesize the English language.