Model Overview
- High-quality sparse mixture-of-experts model
- Licensed under Apache 2.0
- Outperforms Llama 2 70B with 6x faster inference
- Best cost/performance trade-offs, matching or surpassing GPT3.5
Capabilities
- Handles a context of 32k tokens
- Supports English, French, Italian, German, and Spanish
- Strong performance in code generation
- Fine-tunable for instruction-following tasks, achieving 8.3 on MT-Bench
Sparse Architectures
- Sparse mixture-of-experts network
- Decoder-only model with 8 distinct parameter groups
- Router network chooses experts for token processing
- 46.7B total parameters, uses only 12.9B parameters per token
Performance Comparison
- Outperforms Llama 2 70B and GPT3.5 on most benchmarks
- Efficient models compared to Llama 2 family
- Detailed results provided for performance overview
Bias and Language
- Less bias on BBQ benchmark compared to Llama 2
- Displays positive sentiments on BOLD with similar variances
Instructed Models
- Releases Mixtral 8x7B Instruct optimized for instruction-following
- Reaches a score of 8.30 on MT-Bench, comparable to GPT3.5
Open-Source Deployment
- Submitted changes to vLLM project for open-source deployment
- Skypilot enables vLLM endpoints deployment on any cloud instance
Platform Usage
- Mixtral 8x7B available on the mistral-small endpoint in beta
- Early access registration for generative and embedding endpoints
Acknowledgements
- Thanks to CoreWeave and Scaleway teams for technical support during model training.
Follow AI Models on Google News
An easy & free way to support AI Models is to follow our google news feed! More followers will help us reach a wider audience!
Google News: AI Models