Almanac
Topic guide · Beginner

Open Weights Progress: How Freely Available AI Models Caught Up to the Frontier

Open Weights ProgressBeginneractive·v1 · live·generated 6d ago

Part of these paths

TL;DROpen-weights AI — models whose parameters anyone can download, run, and modify — began as a scrappy alternative to proprietary systems but has steadily closed the gap with the best closed models. What started with community releases like BLOOM and Llama 2 has evolved into a multi-lab race where DeepSeek, Mistral, Qwen, Google, and even OpenAI now publish powerful models freely, reshaping who can build AI and how.

Key takeaways

  • Meta's Llama 3.1 405B (July 2024) was the first open-weights model widely regarded as frontier-class, directly competing with closed models at the time.
  • DeepSeek-R1 (MIT license) claimed performance parity with OpenAI o1 on math and reasoning benchmarks, with API pricing at $0.55/$2.19 per million tokens — a fraction of comparable closed-model costs.
  • OpenAI reversed its historically closed strategy in August 2025, releasing gpt-oss-120b and gpt-oss-20b under Apache 2.0.
  • Mistral Small 4 (119B MoE, Apache 2.0) unified reasoning, vision, and coding into a single model deployable on consumer hardware with 40% lower latency than its predecessor.
  • Hugging Face acquired llama.cpp and GGML in February 2026, consolidating the key tools that let people run large models on ordinary computers under one roof.
  • Safety concerns have grown alongside capability: OpenAI published a 'malicious fine-tuning' methodology to assess worst-case risks before open-weight releases, and Anthropic identified large-scale distillation attacks by Chinese labs using Claude outputs to train open models.

What open-weights AI is — and why it matters

When an AI lab trains a large language model, the result is a massive file of numbers called weights — the model's "brain." A closed model keeps those numbers secret; you can only use it through the company's website or API. An open-weights model publishes those numbers for anyone to download, run on their own hardware, fine-tune for a specific job, or build into a product.

That distinction has enormous practical consequences. Open weights mean lower costs (no per-token fees once you're running it yourself), more privacy (your data never leaves your servers), and the freedom to customize. They also mean anyone — including bad actors — can remove safety guardrails. The story of open-weights progress is really the story of that tradeoff playing out at scale.

How it started: from research curiosity to community movement

The modern open-weights era has roots in academic language model research, but the first landmark moment for the general public was BLOOM in July 2022 — a 176-billion-parameter model built collaboratively by over 1,000 researchers through the BigScience workshop and Hugging Face. It proved that a frontier-scale model could be built and released openly, without a single company controlling it.

The real ignition point, though, was Meta releasing Llama 2 in July 2023. Meta is one of the world's largest AI research organizations, and putting a capable model family into the hands of developers — for free, with a permissive license — triggered an explosion of community fine-tuning, tooling, and downstream products. Within months, a small French startup called Mistral AI released Mistral 7B under the Apache 2.0 license (meaning fully free for commercial use), and it outperformed Meta's own Llama 2 13B despite being nearly half the size. The message was clear: open-weights models could be both capable and efficient.

The efficiency breakthrough: Mixture-of-Experts

One of the most important technical ideas to spread through the open-weights world is the Mixture-of-Experts (MoE) architecture. Instead of using all of a model's parameters for every word it processes, an MoE model routes each token through only a small subset of "expert" sub-networks. The result: a model that is large on paper but fast and cheap in practice.

Mistral's Mixtral 8x7B (December 2023) brought this idea to the open-weights community in a big way. It has 46.7 billion total parameters but only activates 12.9 billion at a time — giving it the inference speed of a ~13B model while matching or beating GPT-3.5 on benchmarks. It was released under Apache 2.0 and became one of the most widely deployed open models. MoE has since become the dominant architecture for large open-weights releases: DeepSeek V3 (671B total / 37B active), Qwen3 (235B total / 22B active), and Mistral's own later releases all use it.

Closing the gap: 2024 and the frontier opens up

By mid-2024, the open-weights world had caught up enough that the question shifted from "can open models be useful?" to "can they match the very best closed models?"

Meta's Llama 3.1 405B (July 2024) was the clearest answer yet: yes, at least on many benchmarks. It was Meta's largest open-weights release, with multilingual support and extended context, and it was widely regarded as the first open model to genuinely compete with closed frontier models. Alibaba's Qwen team was also releasing rapidly — Qwen2 (June 2024), Qwen2.5 (September 2024), and a specialized coding variant, Qwen2.5-Coder-32B, that claimed parity with GPT-4o on coding benchmarks.

Mistral kept pace with Mistral Large 2 (123B, July 2024) and then Mixtral 8x22B (Apache 2.0), which claimed to outperform all other open-weight models at the time on standard benchmarks.

Meta also expanded the Llama family into new territory: Llama 3.2 (September 2024) added vision — the ability to understand images — and introduced tiny 1B and 3B models designed to run on phones and edge devices.

The reasoning revolution: DeepSeek changes the conversation

In early 2025, DeepSeek — a Chinese AI lab — released DeepSeek-R1 under the MIT license (one of the most permissive licenses possible, allowing even distillation into other models). The claim: performance on par with OpenAI's o1 on math, code, and reasoning tasks, with API pricing at $0.55 per million input tokens — a fraction of comparable closed-model costs. Six smaller distilled variants were also released, with the 32B and 70B versions reportedly matching OpenAI o1-mini.

This was a turning point. Reasoning models — which "think out loud" through a problem before answering — had been seen as a closed-model advantage. DeepSeek-R1 showed that wasn't necessarily true, and the MIT license meant anyone could build on it freely.

DeepSeek continued iterating: DeepSeek-V3 brought a 671B MoE model running at 60 tokens per second (three times faster than its predecessor), and the V4 preview pushed to 1.6 trillion total parameters with a 1-million-token context window and open weights.

The ecosystem matures: tools, multimodality, and new players

Open-weights progress isn't just about the models themselves — it's about the infrastructure that makes them usable.

llama.cpp is a piece of software that lets people run large language models on ordinary laptops and desktops, without expensive server hardware. In February 2026, Hugging Face — the platform that hosts most open-weights models — acquired both llama.cpp and its underlying library GGML, consolidating the key inference tools under one roof and securing their long-term development.

Multimodal capabilities (understanding images, audio, and video alongside text) have also moved into the open-weights world. Qwen2.5-VL brought vision to the Qwen family; Llama 3.2 added it to Llama; Mistral released Voxtral for speech understanding; and Google DeepMind's Gemma 4 introduced an encoder-free multimodal architecture at the 12B scale.

Mistral has been particularly prolific, releasing a full stack: Magistral for reasoning, Devstral for coding agents, Voxtral for speech, and then consolidating them into Mistral Small 4 — a single 119B MoE model under Apache 2.0 that handles all three, with 40% lower latency than its predecessor and support across vLLM, llama.cpp, SGLang, and Transformers.

OpenAI joins the open-weights world

Perhaps the most symbolically significant development came in August 2025, when OpenAI — the company that had kept its models closed since GPT-2 in 2019 — released gpt-oss-120b and gpt-oss-20b under the Apache 2.0 license. The models were hosted on Hugging Face and claimed to outperform similarly sized open models on reasoning tasks. OpenAI framed the move around accessibility and global reach, and published a safety methodology — "malicious fine-tuning" — to assess worst-case risks before the release.

The safety question grows louder

As open-weights models have become more capable, the safety debate has intensified. The core tension: the same openness that lets a researcher fine-tune a model for a medical application also lets a bad actor remove its safety guardrails.

OpenAI's malicious fine-tuning research (published alongside the gpt-oss release) attempted to quantify how much "uplift" — extra capability for harm — a fine-tuned open model could provide in biology and cybersecurity. Anthropic, meanwhile, publicly identified three Chinese AI labs — DeepSeek, Moonshot AI, and MiniMax — as conducting large-scale "distillation attacks" against Claude, generating over 16 million exchanges through roughly 24,000 fraudulent accounts to train open models on Claude's outputs. Anthropic framed this as a national security concern, arguing that illicitly distilled models strip out safety features.

Separately, Meta's own trajectory has grown more complicated: its Muse Spark model (April 2026), released through its new Superintelligence Labs, was deliberately closed — withholding parameter count, architecture, and training details — marking a notable departure from the open Llama strategy.

Where it's heading

The open-weights world in 2026 looks nothing like it did in 2022. The gap to closed frontier models has narrowed to the point where, on many benchmarks, the best open models are competitive with the best closed ones. The ecosystem of tools for running, fine-tuning, and deploying these models has matured and consolidated. And the roster of labs publishing open weights now includes not just Meta and Mistral but DeepSeek, Alibaba, Google DeepMind, and OpenAI itself.

The open questions are no longer about capability — they're about safety, licensing, and geopolitics. Who bears responsibility when an open model is misused? How do export controls apply to model weights? And as models grow more powerful, will the labs that have championed openness continue to do so — or follow Meta's Muse Spark toward a more closed future?

The open-weights landscape: key labs and their model families

Landmark open-weights models at a glance

ModelLabSizeLicenseNotable first
BLOOMHugging Face / BigScience176BOpen accessFirst large collaborative open multilingual LLM (2022)
Llama 2MetaMultipleCustom (open)Meta's first broadly open-weights release (2023)
Mistral 7BMistral AI7BApache 2.0Outperformed Llama 2 13B at 7B scale (2023)
Mixtral 8x7BMistral AI46.7B total / 12.9B activeApache 2.0First widely-used sparse MoE open model (2023)
Llama 3.1 405BMeta405BOpen weightsFirst frontier-class open model (2024)
DeepSeek-R1DeepSeekMultipleMITOpen reasoning model on par with OpenAI o1 (2025)
gpt-oss-120bOpenAI120BApache 2.0OpenAI's first open-weights release (2025)
DeepSeek V4-ProDeepSeek1.6T total / 49B activeOpen weights1M context open-weights model (2026)

Cells reflect events in this bundle; — indicates data not provided.

Timeline

  1. BLOOM released: first large collaborative open multilingual LLM

  2. Meta releases Llama 2, sparking the modern open-weights era

  3. Mistral 7B launches under Apache 2.0, outperforming Llama 2 13B

  4. Mixtral 8x7B introduces sparse MoE to open weights

  5. Llama 3.1 405B: first frontier-class open model

  6. Qwen3 235B MoE claims parity with DeepSeek-R1 and Gemini 2.5 Pro

  7. OpenAI releases gpt-oss-120b and gpt-oss-20b under Apache 2.0

  8. Hugging Face acquires llama.cpp and GGML

  9. Google DeepMind releases Gemma 4 as its most capable open models

Related topics

FAQ

What does 'open weights' actually mean?

It means the trained model's parameters — the numbers that define how it thinks — are published for anyone to download. You can run the model on your own hardware, modify it, or build products with it, rather than only accessing it through a company's API.

Are open-weights models as good as ChatGPT or Claude?

The gap has narrowed dramatically. Models like DeepSeek-R1 and Llama 3.1 405B have matched closed models on specific benchmarks, though the very latest closed frontier models still tend to lead on the hardest tasks.

Is it safe to use open-weights models?

It depends on the use case. Because anyone can fine-tune them, safety guardrails can be removed — OpenAI published research on exactly this risk before releasing its own open models, and Anthropic identified labs using distillation attacks to strip safety features from Claude.

What is a Mixture-of-Experts (MoE) model?

An MoE model has a large total number of parameters but only activates a small fraction for each token it processes — for example, Mixtral 8x7B has 46.7B total parameters but uses only 12.9B at a time, making it much faster and cheaper to run than a dense model of the same total size.

Where do I actually get these models?

Hugging Face is the main hub — nearly every model in this piece is available there. Hugging Face also now owns llama.cpp and GGML, the tools most commonly used to run large models on ordinary computers.

Stay current

Call Me Almanac pairs the week's AI news with guides like this one — Midweek & Sunday.

Versions

  • v1live6d ago

Related guides (4)

More on Open Weights Progress (6)

5Hugging Face Blog·1mo ago·source ↗

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context

IBM released Granite Embedding Multilingual R2, an open-weights (Apache 2.0) multilingual embedding model with 32K context window, claiming best-in-class retrieval quality among sub-100M parameter models. The model is positioned for enterprise RAG and retrieval use cases across multiple languages. It is hosted and announced via Hugging Face.

6Interconnects·1mo ago·source ↗

Latest open artifacts (#21): Open model bonanza — Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others

Interconnects' recurring open-weights roundup covers a dense cluster of recent releases including Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, and GLM-5.1, characterizing the period as a flagship-after-flagship cadence. The piece also includes commentary on CAISI's assessment of DeepSeek V4. As a tier-2 commentary source, this is a synthesis and analysis layer rather than primary announcements.

5Interconnects·1mo ago·source ↗

How Open Model Ecosystems Compound

This Interconnects commentary examines how China's open-first, high-participation AI ecosystem creates compounding advantages over time. The piece reflects on the structural dynamics of open model ecosystems and their strategic implications. It appears to analyze how broad community participation in open-weight model development accelerates capability progress.

6Interconnects·1mo ago·source ↗

Notes from inside China's AI labs

A firsthand account from visits to leading AI labs in China, offering observations on their research culture, capabilities, and strategic direction. The piece provides rare insider perspective on the state of Chinese frontier AI development. Published on Interconnects, a tier-2 commentary source focused on the AI/ML landscape.

5Hugging Face Blog·1mo ago·source ↗

EMO: Pretraining Mixture of Experts for Emergent Modularity

AllenAI introduces EMO, a pretraining approach for Mixture of Experts (MoE) models that aims to produce emergent modularity during training. The work explores how MoE architectures can develop specialized expert routing without explicit supervision. Published on the Hugging Face blog, this represents research-level work on improving MoE training dynamics and efficiency.

5Interconnects·1mo ago·source ↗

The Distillation Panic

A commentary piece from Interconnects critiques the framing of 'distillation attacks' as a term for the current trend of training models on outputs from frontier systems. The author appears to argue the terminology is misleading or alarmist. The piece engages with ongoing industry debate about knowledge distillation, model output licensing, and competitive dynamics between AI labs.