Step 1 of 7 in Open vs. closed AI: who ships weights, who guards them, and what's at stakeNext: Hugging Face →

Topic guide · Beginner

Open Weights Progress: How Freely Available AI Models Caught Up to the Frontier

Open Weights ProgressBeginneractive·v1 · live·generated 6d ago

Part of these paths

Open weights vs. the closed frontier · Step 1 of 7

TL;DROpen-weights AI — models whose parameters anyone can download, run, and modify — began as a scrappy alternative to proprietary systems but has steadily closed the gap with the best closed models. What started with community releases like BLOOM and Llama 2 has evolved into a multi-lab race where DeepSeek, Mistral, Qwen, Google, and even OpenAI now publish powerful models freely, reshaping who can build AI and how.

Key takeaways

Meta's Llama 3.1 405B (July 2024) was the first open-weights model widely regarded as frontier-class, directly competing with closed models at the time.
DeepSeek-R1 (MIT license) claimed performance parity with OpenAI o1 on math and reasoning benchmarks, with API pricing at $0.55/$2.19 per million tokens — a fraction of comparable closed-model costs.
OpenAI reversed its historically closed strategy in August 2025, releasing gpt-oss-120b and gpt-oss-20b under Apache 2.0.
Mistral Small 4 (119B MoE, Apache 2.0) unified reasoning, vision, and coding into a single model deployable on consumer hardware with 40% lower latency than its predecessor.
Hugging Face acquired llama.cpp and GGML in February 2026, consolidating the key tools that let people run large models on ordinary computers under one roof.
Safety concerns have grown alongside capability: OpenAI published a 'malicious fine-tuning' methodology to assess worst-case risks before open-weight releases, and Anthropic identified large-scale distillation attacks by Chinese labs using Claude outputs to train open models.

What open-weights AI is — and why it matters

When an AI lab trains a large language model, the result is a massive file of numbers called weights — the model's "brain." A closed model keeps those numbers secret; you can only use it through the company's website or API. An open-weights model publishes those numbers for anyone to download, run on their own hardware, fine-tune for a specific job, or build into a product.

That distinction has enormous practical consequences. Open weights mean lower costs (no per-token fees once you're running it yourself), more privacy (your data never leaves your servers), and the freedom to customize. They also mean anyone — including bad actors — can remove safety guardrails. The story of open-weights progress is really the story of that tradeoff playing out at scale.

How it started: from research curiosity to community movement

The modern open-weights era has roots in academic language model research, but the first landmark moment for the general public was BLOOM in July 2022 — a 176-billion-parameter model built collaboratively by over 1,000 researchers through the BigScience workshop and Hugging Face. It proved that a frontier-scale model could be built and released openly, without a single company controlling it.

The real ignition point, though, was Meta releasing Llama 2 in July 2023. Meta is one of the world's largest AI research organizations, and putting a capable model family into the hands of developers — for free, with a permissive license — triggered an explosion of community fine-tuning, tooling, and downstream products. Within months, a small French startup called Mistral AI released Mistral 7B under the Apache 2.0 license (meaning fully free for commercial use), and it outperformed Meta's own Llama 2 13B despite being nearly half the size. The message was clear: open-weights models could be both capable and efficient.

The efficiency breakthrough: Mixture-of-Experts

One of the most important technical ideas to spread through the open-weights world is the Mixture-of-Experts (MoE) architecture. Instead of using all of a model's parameters for every word it processes, an MoE model routes each token through only a small subset of "expert" sub-networks. The result: a model that is large on paper but fast and cheap in practice.

Mistral's Mixtral 8x7B (December 2023) brought this idea to the open-weights community in a big way. It has 46.7 billion total parameters but only activates 12.9 billion at a time — giving it the inference speed of a ~13B model while matching or beating GPT-3.5 on benchmarks. It was released under Apache 2.0 and became one of the most widely deployed open models. MoE has since become the dominant architecture for large open-weights releases: DeepSeek V3 (671B total / 37B active), Qwen3 (235B total / 22B active), and Mistral's own later releases all use it.

Closing the gap: 2024 and the frontier opens up

By mid-2024, the open-weights world had caught up enough that the question shifted from "can open models be useful?" to "can they match the very best closed models?"

Meta's Llama 3.1 405B (July 2024) was the clearest answer yet: yes, at least on many benchmarks. It was Meta's largest open-weights release, with multilingual support and extended context, and it was widely regarded as the first open model to genuinely compete with closed frontier models. Alibaba's Qwen team was also releasing rapidly — Qwen2 (June 2024), Qwen2.5 (September 2024), and a specialized coding variant, Qwen2.5-Coder-32B, that claimed parity with GPT-4o on coding benchmarks.

Mistral kept pace with Mistral Large 2 (123B, July 2024) and then Mixtral 8x22B (Apache 2.0), which claimed to outperform all other open-weight models at the time on standard benchmarks.

Meta also expanded the Llama family into new territory: Llama 3.2 (September 2024) added vision — the ability to understand images — and introduced tiny 1B and 3B models designed to run on phones and edge devices.

The reasoning revolution: DeepSeek changes the conversation

In early 2025, DeepSeek — a Chinese AI lab — released DeepSeek-R1 under the MIT license (one of the most permissive licenses possible, allowing even distillation into other models). The claim: performance on par with OpenAI's o1 on math, code, and reasoning tasks, with API pricing at $0.55 per million input tokens — a fraction of comparable closed-model costs. Six smaller distilled variants were also released, with the 32B and 70B versions reportedly matching OpenAI o1-mini.

This was a turning point. Reasoning models — which "think out loud" through a problem before answering — had been seen as a closed-model advantage. DeepSeek-R1 showed that wasn't necessarily true, and the MIT license meant anyone could build on it freely.

DeepSeek continued iterating: DeepSeek-V3 brought a 671B MoE model running at 60 tokens per second (three times faster than its predecessor), and the V4 preview pushed to 1.6 trillion total parameters with a 1-million-token context window and open weights.

The ecosystem matures: tools, multimodality, and new players

Open-weights progress isn't just about the models themselves — it's about the infrastructure that makes them usable.

llama.cpp is a piece of software that lets people run large language models on ordinary laptops and desktops, without expensive server hardware. In February 2026, Hugging Face — the platform that hosts most open-weights models — acquired both llama.cpp and its underlying library GGML, consolidating the key inference tools under one roof and securing their long-term development.

Multimodal capabilities (understanding images, audio, and video alongside text) have also moved into the open-weights world. Qwen2.5-VL brought vision to the Qwen family; Llama 3.2 added it to Llama; Mistral released Voxtral for speech understanding; and Google DeepMind's Gemma 4 introduced an encoder-free multimodal architecture at the 12B scale.

Mistral has been particularly prolific, releasing a full stack: Magistral for reasoning, Devstral for coding agents, Voxtral for speech, and then consolidating them into Mistral Small 4 — a single 119B MoE model under Apache 2.0 that handles all three, with 40% lower latency than its predecessor and support across vLLM, llama.cpp, SGLang, and Transformers.

OpenAI joins the open-weights world

Perhaps the most symbolically significant development came in August 2025, when OpenAI — the company that had kept its models closed since GPT-2 in 2019 — released gpt-oss-120b and gpt-oss-20b under the Apache 2.0 license. The models were hosted on Hugging Face and claimed to outperform similarly sized open models on reasoning tasks. OpenAI framed the move around accessibility and global reach, and published a safety methodology — "malicious fine-tuning" — to assess worst-case risks before the release.

The safety question grows louder

As open-weights models have become more capable, the safety debate has intensified. The core tension: the same openness that lets a researcher fine-tune a model for a medical application also lets a bad actor remove its safety guardrails.

OpenAI's malicious fine-tuning research (published alongside the gpt-oss release) attempted to quantify how much "uplift" — extra capability for harm — a fine-tuned open model could provide in biology and cybersecurity. Anthropic, meanwhile, publicly identified three Chinese AI labs — DeepSeek, Moonshot AI, and MiniMax — as conducting large-scale "distillation attacks" against Claude, generating over 16 million exchanges through roughly 24,000 fraudulent accounts to train open models on Claude's outputs. Anthropic framed this as a national security concern, arguing that illicitly distilled models strip out safety features.

Separately, Meta's own trajectory has grown more complicated: its Muse Spark model (April 2026), released through its new Superintelligence Labs, was deliberately closed — withholding parameter count, architecture, and training details — marking a notable departure from the open Llama strategy.

Where it's heading

The open-weights world in 2026 looks nothing like it did in 2022. The gap to closed frontier models has narrowed to the point where, on many benchmarks, the best open models are competitive with the best closed ones. The ecosystem of tools for running, fine-tuning, and deploying these models has matured and consolidated. And the roster of labs publishing open weights now includes not just Meta and Mistral but DeepSeek, Alibaba, Google DeepMind, and OpenAI itself.

The open questions are no longer about capability — they're about safety, licensing, and geopolitics. Who bears responsibility when an open model is misused? How do export controls apply to model weights? And as models grow more powerful, will the labs that have championed openness continue to do so — or follow Meta's Muse Spark toward a more closed future?

The open-weights landscape: key labs and their model families

Landmark open-weights models at a glance

Model	Lab	Size	License	Notable first
BLOOM	Hugging Face / BigScience	176B	Open access	First large collaborative open multilingual LLM (2022)
Llama 2	Meta	Multiple	Custom (open)	Meta's first broadly open-weights release (2023)
Mistral 7B	Mistral AI	7B	Apache 2.0	Outperformed Llama 2 13B at 7B scale (2023)
Mixtral 8x7B	Mistral AI	46.7B total / 12.9B active	Apache 2.0	First widely-used sparse MoE open model (2023)
Llama 3.1 405B	Meta	405B	Open weights	First frontier-class open model (2024)
DeepSeek-R1	DeepSeek	Multiple	MIT	Open reasoning model on par with OpenAI o1 (2025)
gpt-oss-120b	OpenAI	120B	Apache 2.0	OpenAI's first open-weights release (2025)
DeepSeek V4-Pro	DeepSeek	1.6T total / 49B active	Open weights	1M context open-weights model (2026)

Cells reflect events in this bundle; — indicates data not provided.

Timeline

FAQ

What does 'open weights' actually mean?

It means the trained model's parameters — the numbers that define how it thinks — are published for anyone to download. You can run the model on your own hardware, modify it, or build products with it, rather than only accessing it through a company's API.

Are open-weights models as good as ChatGPT or Claude?

The gap has narrowed dramatically. Models like DeepSeek-R1 and Llama 3.1 405B have matched closed models on specific benchmarks, though the very latest closed frontier models still tend to lead on the hardest tasks.

Is it safe to use open-weights models?

It depends on the use case. Because anyone can fine-tune them, safety guardrails can be removed — OpenAI published research on exactly this risk before releasing its own open models, and Anthropic identified labs using distillation attacks to strip safety features from Claude.

What is a Mixture-of-Experts (MoE) model?

An MoE model has a large total number of parameters but only activates a small fraction for each token it processes — for example, Mixtral 8x7B has 46.7B total parameters but uses only 12.9B at a time, making it much faster and cheaper to run than a dense model of the same total size.

Where do I actually get these models?

Hugging Face is the main hub — nearly every model in this piece is available there. Hugging Face also now owns llama.cpp and GGML, the tools most commonly used to run large models on ordinary computers.

Stay current

Call Me Almanac pairs the week's AI news with guides like this one — Midweek & Sunday.

Versions

v1live6d ago

Related guides (4)

Open Weights ProgressTopic guide

Open Weights Progress: From Llama 2 to Frontier Parity

Read asIn-depth

Frontier Model ReleasesTopic guide

Frontier Model Releases: The Race From Language to Action

Read asBeginner In-depth

OpenAI

OpenAI: The Lab That Made AI a Household Word

Read asBeginner In-depth

Alignment and RLHFTopic guide

Alignment and RLHF: From Human Feedback to Scalable Post-Training

Read asIn-depth

More on Open Weights Progress (6)

5Hugging Face Blog·1mo ago·source ↗

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context

IBM released Granite Embedding Multilingual R2, an open-weights (Apache 2.0) multilingual embedding model with 32K context window, claiming best-in-class retrieval quality among sub-100M parameter models. The model is positioned for enterprise RAG and retrieval use cases across multiple languages. It is hosted and announced via Hugging Face.

Long Context Evolution Open Weights Progress Granite Embedding Multilingual R2 IBM Apache 2.0 +2 more

6Interconnects·1mo ago·source ↗

Latest open artifacts (#21): Open model bonanza — Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others

Interconnects' recurring open-weights roundup covers a dense cluster of recent releases including Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, and GLM-5.1, characterizing the period as a flagship-after-flagship cadence. The piece also includes commentary on CAISI's assessment of DeepSeek V4. As a tier-2 commentary source, this is a synthesis and analysis layer rather than primary announcements.

Frontier Model Releases Evaluation and Benchmarking MiMo 2.5 Interconnects DeepSeek V4 +7 more

5Interconnects·1mo ago·source ↗

How Open Model Ecosystems Compound

This Interconnects commentary examines how China's open-first, high-participation AI ecosystem creates compounding advantages over time. The piece reflects on the structural dynamics of open model ecosystems and their strategic implications. It appears to analyze how broad community participation in open-weight model development accelerates capability progress.

Frontier Model Releases Open Weights Progress Interconnects China

6Interconnects·1mo ago·source ↗

Notes from inside China's AI labs

A firsthand account from visits to leading AI labs in China, offering observations on their research culture, capabilities, and strategic direction. The piece provides rare insider perspective on the state of Chinese frontier AI development. Published on Interconnects, a tier-2 commentary source focused on the AI/ML landscape.

Frontier Model Releases Open Weights Progress Interconnects China AI Labs +1 more

5Hugging Face Blog·1mo ago·source ↗

EMO: Pretraining Mixture of Experts for Emergent Modularity

AllenAI introduces EMO, a pretraining approach for Mixture of Experts (MoE) models that aims to produce emergent modularity during training. The work explores how MoE architectures can develop specialized expert routing without explicit supervision. Published on the Hugging Face blog, this represents research-level work on improving MoE training dynamics and efficiency.

Training Infrastructure Frontier Model Releases AllenAI Mixture of Experts Hugging Face +2 more

5Interconnects·1mo ago·source ↗

The Distillation Panic

A commentary piece from Interconnects critiques the framing of 'distillation attacks' as a term for the current trend of training models on outputs from frontier systems. The author appears to argue the terminology is misleading or alarmist. The piece engages with ongoing industry debate about knowledge distillation, model output licensing, and competitive dynamics between AI labs.

Frontier Model Releases Open Weights Progress Interconnects +1 more

Open Weights Progress: How Freely Available AI Models Caught Up to the Frontier

Part of these paths

Key takeaways

What open-weights AI is — and why it matters

How it started: from research curiosity to community movement

The efficiency breakthrough: Mixture-of-Experts

Closing the gap: 2024 and the frontier opens up

The reasoning revolution: DeepSeek changes the conversation

The ecosystem matures: tools, multimodality, and new players

OpenAI joins the open-weights world

The safety question grows louder

Where it's heading

The open-weights landscape: key labs and their model families

Landmark open-weights models at a glance

Timeline

Related topics

FAQ

Stay current

Versions

Related guides (4)

Open Weights Progress: From Llama 2 to Frontier Parity

Frontier Model Releases: The Race From Language to Action

OpenAI: The Lab That Made AI a Household Word

Alignment and RLHF: From Human Feedback to Scalable Post-Training

More on Open Weights Progress (6)

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context

Latest open artifacts (#21): Open model bonanza — Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others

How Open Model Ecosystems Compound

Notes from inside China's AI labs

EMO: Pretraining Mixture of Experts for Emergent Modularity

The Distillation Panic