What this area covers
Open-weights progress tracks the multi-year effort to make frontier-class language models publicly available — weights downloadable, deployable on private infrastructure, and fine-tunable without API intermediaries. The thread spans model releases from Meta, Mistral AI, DeepSeek, Alibaba's Qwen team, Google DeepMind, and — most recently — OpenAI, as well as the inference infrastructure, licensing regimes, and safety debates that shape how those models are actually used.
Why it matters
The practical stakes are high on multiple axes. For practitioners, open weights mean the ability to fine-tune on proprietary data, run inference behind a firewall, and avoid per-token API costs at scale. For the broader AI ecosystem, the gap between open and closed models is a proxy for how concentrated frontier capability is — and how quickly that concentration can be disrupted. For safety researchers and policymakers, open weights introduce irreversible proliferation: once weights are public, they cannot be recalled.
Phase 1: Establishing the baseline (2022–2023)
The modern open-weights era begins with BLOOM (176B, July 2022), a collaborative multilingual model from Hugging Face and the BigScience workshop — the first open model at GPT-3 scale. Meta's Llama 2 (July 2023) shifted the dynamic: a well-resourced frontier lab releasing competitive weights under a broadly permissive license, distributed through Hugging Face with Microsoft as a partner.
Mistral AI then demonstrated that a small team could punch above its weight. Mistral 7B (September 2023, Apache 2.0) outperformed Llama 2 13B across all evaluated benchmarks using Grouped-Query Attention and Sliding Window Attention for efficient inference. Three months later, Mixtral 8x7B (December 2023, Apache 2.0) introduced sparse Mixture-of-Experts to the open ecosystem: 46.7B total parameters, only 12.9B active per token, matching or exceeding GPT-3.5 at the inference cost of a 12.9B dense model. This architectural pattern — large total capacity, small active footprint — became the template for nearly every major open release that followed.
Phase 2: Scaling and multimodality (2024)
2024 was defined by scale races and capability expansion. Meta released Llama 3 (April 2024), then Llama 3.1 (July 2024) with a 405B flagship, multilingual support, and extended context — the first open model credibly positioned as frontier-class at release. Mistral followed with Mixtral 8x22B (April 2024, Apache 2.0, 141B total / 39B active, 64K context) and Mistral Large 2 (July 2024, 123B, 128K context, 80+ coding languages).
Alibaba's Qwen team emerged as a major force. Qwen2 (June 2024) introduced 128K context and strong multilingual coverage. Qwen2.5 (September 2024) was described as potentially the largest open-source model release in history by parameter count across the full family. Qwen2.5-Coder-32B (November 2024) claimed parity with GPT-4o on coding benchmarks — a significant milestone for a specialized open model.
Multimodality arrived in open weights: Llama 3.2 (September 2024) added vision-capable models alongside 1B/3B edge variants, and Qwen2.5-VL (January 2025) delivered a 72B vision-language model across three sizes.
Phase 3: Reasoning, agentic capability, and frontier parity (2025–2026)
The most consequential shift was DeepSeek-R1 (MIT license, weights and outputs freely usable for distillation). Claiming parity with OpenAI o1 on math, code, and reasoning benchmarks, with six distilled smaller variants and API pricing at $0.55/$2.19 per million tokens, R1 demonstrated that reasoning-class capability was no longer a closed-lab exclusive. DeepSeek-V3 (671B MoE, 37B active, 14.8T training tokens, 60 tokens/second) followed as a fully open-source frontier alternative with API pricing at $0.27/$1.10 per million tokens.
Mistral expanded its open portfolio into reasoning with Magistral Small (24B, Apache 2.0, 70.7% on AIME2024), coding agents with Devstral 2 (123B, 72.2% SWE-bench Verified, 256K context), and speech with Voxtral (24B and 3B, Apache 2.0, outperforming Whisper large-v3). Qwen3 (April 2025) brought a 235B MoE flagship claiming competitive performance against DeepSeek-R1, OpenAI o1/o3-mini, Grok-3, and Gemini-2.5-Pro.
The most strategically significant event of 2025 was OpenAI's entry into open weights. In August 2025, OpenAI released gpt-oss-120b and gpt-oss-20b under Apache 2.0 — a direct reversal of its historically closed posture, driven by competitive pressure and framed around accessibility and global reach. The release was accompanied by a safety evaluation methodology (malicious fine-tuning, or MFT) designed to assess worst-case risks before open-weight releases, signaling that safety governance for open models was becoming a first-class concern.
Into 2026, the frontier continued to move. Mistral Large 3 (675B MoE / 41B active, Apache 2.0) debuted at #2 on LMArena's OSS non-reasoning leaderboard. DeepSeek-V4-Pro (1.6T total / 49B active, 1M context by default via Token-wise compression and DeepSeek Sparse Attention) claimed open-source SOTA on agentic coding. Qwen3-Coder-480B-A35B (480B MoE, 256K native context, 1M via extrapolation) claimed open-weight SOTA on agentic coding and browser-use, with performance described as comparable to Claude Sonnet 4. Mistral Medium 3.5 (128B dense, 77.6% SWE-Bench Verified, 256K context, runs on four GPUs) demonstrated that dense models could remain competitive with MoE at the 128B scale. Google DeepMind released Gemma 4 and a Gemma 4 12B with a unified encoder-free multimodal architecture.
The infrastructure layer
Model releases are only half the story. The serving and fine-tuning stack that makes open weights usable has matured in parallel. vLLM, SGLang, llama.cpp, and Transformers are the dominant inference frameworks; NVIDIA NIM provides enterprise packaging with day-0 support for many releases. The most significant infrastructure event was GGML and llama.cpp joining Hugging Face in February 2026, consolidating the primary local-inference stack — which underpins most consumer and on-device deployments — under a single open-source organization with long-term sustainability backing.
Hugging Face itself functions as the de facto distribution layer: nearly every major open-weights release in this bundle is available there, often on the same day as the lab announcement.
Safety and governance tensions
Open weights introduce irreversible proliferation risk that closed APIs do not. Two distinct concerns have crystallized in the events:
Distillation attacks. Anthropic publicly identified DeepSeek, Moonshot AI, and MiniMax as conducting coordinated large-scale distillation campaigns against Claude — generating over 16 million exchanges through approximately 24,000 fraudulent accounts to train competing models. Anthropic framed this as a national security concern, arguing that illicitly distilled models strip out safety safeguards and undermine export controls.
Malicious fine-tuning. OpenAI introduced a methodology called malicious fine-tuning (MFT) to assess worst-case risks of open-weight releases, specifically probing for dangerous capability uplift in biology and cybersecurity domains. This represents an emerging norm: safety evaluation before open release, not just before API deployment.
Meta's Muse Spark (April 2026) — the first closed-weights model from Meta's Superintelligence Labs — signals that even the most committed open-weights lab is hedging: some capability tiers may remain proprietary regardless of the competitive environment.
Where it is heading
The open-weights frontier is no longer defined by the capability gap to closed models — on coding, math, and reasoning, that gap has largely closed. The active frontiers are:
- Agentic and long-context capability: 1M-token context windows and tool-integrated chain-of-thought are now table stakes for flagship open releases.
- Inference economics: MoE architectures, sparse attention (DeepSeek Sparse Attention, DSA), and aggressive API price cuts are driving the cost of frontier-class inference toward commodity levels.
- Safety governance for open weights: MFT evaluations, distillation detection, and the question of which capability tiers should remain closed are unresolved and increasingly contested.
- Infrastructure consolidation: The acquisition of llama.cpp/GGML by Hugging Face suggests the ecosystem is moving toward a more unified, sustainably maintained serving stack rather than a fragmented collection of independent projects.




