Model Overview
- 7.3B parameter model
- Outperforms Llama 2 13B on all benchmarks
- Approaches CodeLlama 7B performance on code tasks
- Utilizes Grouped-query attention (GQA) for faster inference
- Incorporates Sliding Window Attention (SWA) for handling longer sequences efficiently
- Released under Apache 2.0 license
Performance Highlights
- Surpasses Llama 2 13B on all metrics
- Comparable to Llama 34B in various benchmarks
- Demonstrates superior capabilities in code, reasoning, and English tasks
- Provides a model fine-tuned for chat, outperforming Llama 2 13B chat
Equivalent Model Sizes
- Mistral 7B performs equivalently to a Llama 2 three times its size in reasoning, comprehension, and STEM reasoning (MMLU)
- Significant savings in memory and enhanced throughput
Attention Mechanisms
- Utilizes Sliding Window Attention (SWA) for linear compute cost and improved speed
- Linear compute cost of O(sliding_window.seq_len)
- Explores attention drift with local attention, limiting cache size for improved memory efficiency
Fine-Tuning for Chat
- Fine-tuned on instruction datasets available on HuggingFace
- Mistral 7B Instruct model outperforms all 7B models on MT-Bench and is comparable to 13B chat models
- No tricks or proprietary data used in fine-tuning
Follow AI Models on Google News
An easy & free way to support AI Models is to follow our google news feed! More followers will help us reach a wider audience!
Google News: AI Models