Mistral AI Releases Mixtral 8x22B Under Apache 2.0
Mistral AI has released Mixtral 8x22B, a sparse Mixture-of-Experts model with 141B total parameters but only 39B active parameters, under the permissive Apache 2.0 license. The model features a 64K token context window, native function calling, multilingual support across five European languages, and strong math and coding performance. Mistral claims it outperforms all other open-weight models on standard benchmarks while being faster than dense 70B models due to sparse activation. An instructed version achieves 90.8% on GSM8K maj@8.
Related guides (4)
Related events (8)
Mixtral 8x7B: Mistral AI Releases Sparse Mixture-of-Experts Open-Weight Model
Mistral AI has released Mixtral 8x7B, a sparse mixture-of-experts (SMoE) model with 46.7B total parameters but only 12.9B active parameters per token, enabling inference speed and cost equivalent to a 12.9B model. Licensed under Apache 2.0, Mixtral outperforms Llama 2 70B on most benchmarks and matches or exceeds GPT-3.5, with support for 32k context, five European languages, and strong code generation. An instruction-tuned variant (Mixtral 8x7B Instruct) achieves 8.3 on MT-Bench, claimed best among open-source models at release. The model is deployed behind Mistral's mistral-small API endpoint and supported via vLLM with Megablocks CUDA kernels.
Mistral Small 4: Unified Multimodal, Reasoning, and Coding MoE Model Released Under Apache 2.0
Mistral AI has released Mistral Small 4, a 119B-parameter Mixture-of-Experts model (6B active per token) that unifies capabilities previously split across Magistral (reasoning), Pixtral (multimodal), and Devstral (coding agents) into a single open-weights model. The model features a 256k context window, configurable reasoning effort via a `reasoning_effort` parameter, native text and image input support, and is released under Apache 2.0. Mistral claims 40% latency reduction and 3x throughput improvement over Mistral Small 3, with benchmark results showing competitive performance against GPT-OSS 120B and Qwen models while producing significantly shorter outputs. The release includes day-0 availability as an NVIDIA NIM and support across vLLM, llama.cpp, SGLang, and Transformers.
Mistral Small 3: 24B Latency-Optimized Open-Weight Model Released Under Apache 2.0
Mistral AI has released Mistral Small 3, a 24B-parameter instruction-tuned model optimized for low latency, achieving over 81% on MMLU at 150 tokens/s on a single GPU. The model is competitive with Llama 3.3 70B and Qwen 32B while being more than 3x faster on equivalent hardware, and is released under Apache 2.0 for both pretrained and instruction-tuned checkpoints. It is explicitly not trained with RL or synthetic data, positioning it as a base model for community fine-tuning and reasoning capability development. Deployment targets include local inference on consumer hardware (RTX 4090, MacBook 32GB RAM), agentic function calling, and domain-specific fine-tuning.
Mistral AI Founding Manifesto and Mistral 7B Release
Mistral AI published its founding mission statement alongside the release of Mistral 7B, a 7-billion-parameter open-weights language model released under Apache 2.0. The model claims to outperform all available open models up to 13B parameters on standard English and code benchmarks, produced in three months from a standing start. The post articulates Mistral's strategic thesis: open-weight models will outcompete proprietary black-box APIs for most enterprise use cases, drawing analogies to Linux, WebKit, and Kubernetes. The company signals intent to release progressively larger frontier models while building a commercial offering around on-premise and VPC deployment.
Mistral Small 3.1: Multimodal, 128k Context, Apache 2.0 Open-Weight Model
Mistral AI releases Mistral Small 3.1, a ~24B parameter model with multimodal understanding, 128k token context window, and claimed best-in-class performance among small models, outperforming Gemma 3 and GPT-4o Mini on text, multimodal, and multilingual benchmarks. The model runs on a single RTX 4090 or 32GB RAM Mac at 150 tokens/second and is released under Apache 2.0 license with both base and instruct checkpoints. It is available on HuggingFace, Mistral's La Plateforme API, and Google Cloud Vertex AI, with NVIDIA NIM and Azure AI Foundry support coming soon. The release targets enterprise and on-device use cases including document verification, agentic workflows, and domain fine-tuning.
Mistral NeMo: 12B Open-Weights Model with 128k Context, Built with NVIDIA
Mistral AI and NVIDIA jointly release Mistral NeMo, a 12B parameter model under Apache 2.0 license featuring a 128k token context window and a new tokenizer called Tekken based on Tiktoken. The model is designed as a drop-in replacement for Mistral 7B, supports multilingual applications across 11+ languages, and was trained with quantization awareness enabling FP8 inference without performance loss. Benchmark comparisons show competitive performance against Gemma 2 9B and Llama 3 8B. Weights are available on HuggingFace and the model is also packaged as an NVIDIA NIM inference microservice.
Mistral AI Releases Devstral: Apache 2.0 Agentic Coding Model with SWE-Bench SOTA
Mistral AI, in collaboration with All Hands AI, releases Devstral, an agentic LLM specialized for software engineering tasks under the Apache 2.0 license. The model achieves 46.8% on SWE-Bench Verified, surpassing prior open-source state-of-the-art by over 6 percentage points and outperforming larger models like DeepSeek-V3-0324 (671B) and Qwen3 232B-A22B under the same OpenHands scaffold. Devstral is small enough to run on a single RTX 4090 or a Mac with 32GB RAM, and is available via Mistral's API at $0.1/M input tokens, as well as on HuggingFace, Ollama, and other platforms. Mistral indicates a larger agentic coding model is in development.
Mistral Large 2 (123B): New Frontier Model with 128k Context, Multilingual and Code Capabilities
Mistral AI releases Mistral Large 2, a 123-billion-parameter model with a 128k context window, supporting 80+ coding languages and over a dozen natural languages. The model claims competitive performance with GPT-4o, Claude 3 Opus, and Llama 3 405B on code generation, reasoning, and multilingual benchmarks, while targeting cost-efficient single-node inference. Weights are available under a Mistral Research License for non-commercial use, with a commercial license required for self-deployment. The model is accessible via Mistral's la Plateforme API (mistral-large-2407), HuggingFace, and Google Cloud Vertex AI.



