Mixtral
mixtral-895c4671·2 events·first seen 1mo agoAliases: Mixtral
Co-occurring entities
More like this (12)
Recent events (2)
Qwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters
Alibaba's Qwen team releases Qwen1.5-MoE-A2.7B, a mixture-of-experts model with only 2.7 billion activated parameters that claims performance parity with 7B dense models such as Mistral 7B and Qwen1.5-7B. The model activates roughly one-third of its total parameters during inference, offering significant compute efficiency gains. This release follows growing industry interest in MoE architectures sparked by Mixtral, and the model is available on GitHub, HuggingFace, and ModelScope.
Mistral Releases Mistral 3 Family: Mistral Large 3 (675B MoE) and Ministral 3 Series (3B–14B), All Apache 2.0
Mistral AI has announced Mistral 3, a family of open-weight models including Mistral Large 3 (41B active / 675B total sparse MoE) and three dense Ministral 3 edge models (3B, 8B, 14B), all released under Apache 2.0. Mistral Large 3 debuts at #2 on LMArena's OSS non-reasoning leaderboard, supports image understanding, and was trained on 3,000 NVIDIA H200 GPUs; a reasoning variant is forthcoming. The Ministral 3 series includes base, instruct, and reasoning variants with multimodal and multilingual capabilities, with the 14B reasoning model achieving 85% on AIME '25. The release involves deep co-optimization with NVIDIA (Blackwell/Hopper kernels, NVFP4 format), vLLM, and Red Hat, and is available across major cloud and inference platforms.