5Hugging Face Blog·19d ago

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

JetBrains has released Mellum2, a 12-billion-parameter Mixture-of-Experts model announced via the Hugging Face blog. The model appears to be a successor to JetBrains' earlier Mellum code-focused model. No body content was provided, so specific capability details, benchmarks, or licensing terms are unavailable from this source.

Frontier Model Releases Open Weights Progress Agent and Tool Ecosystem Mellum2 JetBrains Mixture of Experts Hugging Face Mellum

Related guides (3)

Hugging Face

Hugging Face: The Home of Open-Source AI

Read asBeginner In-depth

Mixture of ExpertsConcept

Mixture of Experts: How AI Models Do More by Using Less

Read asBeginner In-depth

Frontier Model ReleasesTopic guide

Frontier Model Releases: The Race From Language to Action

Read asBeginner In-depth

Related events (8)

9Mistral Ai News·19d ago·source ↗

Mixtral 8x7B: Mistral AI Releases Sparse Mixture-of-Experts Open-Weight Model

Mistral AI has released Mixtral 8x7B, a sparse mixture-of-experts (SMoE) model with 46.7B total parameters but only 12.9B active parameters per token, enabling inference speed and cost equivalent to a 12.9B model. Licensed under Apache 2.0, Mixtral outperforms Llama 2 70B on most benchmarks and matches or exceeds GPT-3.5, with support for 32k context, five European languages, and strong code generation. An instruction-tuned variant (Mixtral 8x7B Instruct) achieves 8.3 on MT-Bench, claimed best among open-source models at release. The model is deployed behind Mistral's mistral-small API endpoint and supported via vLLM with Megablocks CUDA kernels.

Frontier Model Releases Evaluation and Benchmarking Mistral AI Llama 2 70B Mistral Small 4 +15 more

7Hugging Face Blog·1mo ago·source ↗

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

Hugging Face published a blog post welcoming Mixtral, Mistral AI's Mixture of Experts (MoE) language model, to the platform. The post covers Mixtral's architecture, which uses 8 experts with 2 active per token, and its integration into the Hugging Face ecosystem including transformers support. Mixtral was positioned as a state-of-the-art open-weights model competitive with much larger dense models.

Frontier Model Releases Open Weights Progress Mistral AI Mixtral 8x7B Mixture of Experts +2 more

5Hugging Face Blog·1mo ago·source ↗

Mixture of Experts Explained

This Hugging Face blog post provides a technical overview of the Mixture of Experts (MoE) architecture, explaining how sparse gating mechanisms route tokens to subsets of expert feed-forward layers to achieve computational efficiency. The post covers training dynamics, inference considerations, and the tradeoffs between dense and sparse models. It serves as a reference document contextualizing MoE's growing relevance following high-profile model releases using the architecture.

Training Infrastructure Frontier Model Releases Mixture of Experts Hugging Face sparse gating +1 more

8Mistral Ai News·19d ago·source ↗

Mistral AI Releases Mixtral 8x22B Under Apache 2.0

Mistral AI has released Mixtral 8x22B, a sparse Mixture-of-Experts model with 141B total parameters but only 39B active parameters, under the permissive Apache 2.0 license. The model features a 64K token context window, native function calling, multilingual support across five European languages, and strong math and coding performance. Mistral claims it outperforms all other open-weight models on standard benchmarks while being faster than dense 70B models due to sparse activation. An instructed version achieves 90.8% on GSM8K maj@8.

Frontier Model Releases Open Weights Progress Mistral AI Llama 2 70B Apache 2.0 +10 more

7Qwen Research·1mo ago·source ↗

Qwen2.5-Max: Large-Scale MoE Model Release by Alibaba's Qwen Team

Alibaba's Qwen team announces Qwen2.5-Max, a large-scale Mixture-of-Experts language model. The post acknowledges that scaling insights for very large MoE models have been limited, citing DeepSeek V3's recent disclosures as a reference point. The model is positioned as a frontier-scale MoE system developed concurrently with ongoing Qwen2 research.

Training Infrastructure Frontier Model Releases DeepSeek V4 Alibaba Qwen Team +3 more

8Mistral Ai News·1mo ago·source ↗

Mistral Small 4: Unified Multimodal, Reasoning, and Coding MoE Model Released Under Apache 2.0

Mistral AI has released Mistral Small 4, a 119B-parameter Mixture-of-Experts model (6B active per token) that unifies capabilities previously split across Magistral (reasoning), Pixtral (multimodal), and Devstral (coding agents) into a single open-weights model. The model features a 256k context window, configurable reasoning effort via a `reasoning_effort` parameter, native text and image input support, and is released under Apache 2.0. Mistral claims 40% latency reduction and 3x throughput improvement over Mistral Small 3, with benchmark results showing competitive performance against GPT-OSS 120B and Qwen models while producing significantly shorter outputs. The release includes day-0 availability as an NVIDIA NIM and support across vLLM, llama.cpp, SGLang, and Transformers.

Long Context Evolution Frontier Model Releases Mistral AI Mistral Small 4 Pixtral +14 more

7Meta Llama·11d ago·source ↗

Meta releases Llama 4 Maverick 17B-128E multimodal model on Hugging Face

Meta released Llama 4 Maverick, a 17B active parameter model with 128 experts (mixture-of-experts architecture), on Hugging Face. The model supports image-text-to-text tasks, making it a multimodal open-weights release. This is part of the Llama 4 generation, representing Meta's latest open-weights frontier push with MoE architecture.

Frontier Model Releases Open Weights Progress Llama 4 Maverick 17B-128E Hugging Face Meta +1 more

4Hugging Face Blog·1mo ago·source ↗

Mixture of Experts (MoEs) in Transformers

A Hugging Face blog post covering Mixture of Experts (MoE) architectures as applied to transformer models. The post likely explains the technical foundations, training considerations, and practical deployment aspects of MoE models. Given the timing in early 2026, it likely contextualizes recent MoE-based frontier models and tooling support within the Hugging Face ecosystem.

Training Infrastructure Frontier Model Releases Transformers Mixture of Experts Hugging Face +1 more