Almanac
Guide · Beginner

Qwen: Alibaba's Open-Weight AI Model Family

QwenBeginneractive·v1 · live·generated 5d ago
TL;DRQwen is Alibaba's family of open-weight AI models, spanning text, code, images, audio, and video — released freely for anyone to download and run. What started as a language model experiment has grown into one of the most actively downloaded model families in the world, competing directly with frontier closed models while remaining openly available.

Key takeaways

  • Qwen models are released on Hugging Face and ModelScope under open weights, with some variants reaching millions of downloads within days of release.
  • The flagship Qwen3-Coder-480B is a 480-billion-parameter coding model claiming performance comparable to Claude Sonnet 4 on agentic coding tasks.
  • The family spans a huge size range — from a 0.8B model small enough to run on a phone to a 480B model requiring serious hardware — giving developers options at every scale.
  • Qwen2.5-1M extended open-weight context to 1 million tokens, matching a capability previously only seen in proprietary models.
  • Qwen's research team publishes training innovations (like GSPO for stable reinforcement learning) alongside model releases, contributing to the broader AI research community.
  • Mistral's Mistral Small 4 explicitly benchmarks against Qwen models, signaling Qwen's status as a key competitive reference point in the open-weights world.

What Qwen is

Qwen is Alibaba's family of open-weight AI models — software that can read, write, reason, code, look at images, and listen to audio. "Open-weight" means the model files are published for anyone to download and run, rather than being locked behind a paid API. The Qwen team, part of Alibaba, has been releasing models publicly since at least 2022 and has grown into one of the most prolific and widely-downloaded AI model families in the world.

Think of Qwen less like a single product and more like a platform: dozens of models at different sizes and specializations, all sharing a common lineage and design philosophy.

Why it matters to you

If you work in tech, Qwen is relevant for a simple reason: it gives you access to genuinely powerful AI that you can run on your own infrastructure, customize, and deploy without per-query fees or data-sharing concerns. The Qwen3.5-4B model alone has been downloaded over 10 million times on Hugging Face — a sign that developers worldwide are actively building with it.

For organizations worried about cost, privacy, or vendor lock-in, open-weight models like Qwen are the alternative to sending data to a third-party cloud.

What the family includes

Qwen covers a wide range of use cases:

  • Text and reasoning: The core language models handle writing, question-answering, and analysis. The QwQ-32B line uses reinforcement learning (a training technique that rewards correct answers) to push reasoning quality further, drawing comparisons to DeepSeek R1's approach.
  • Code: Qwen3-Coder-480B is a 480-billion-parameter model focused on writing and debugging code, running automated coding agents, and using tools — claiming performance comparable to Anthropic's Claude Sonnet 4 on these tasks.
  • Images: Models like Qwen-VL and the Qwen3.5/3.6 series can look at pictures and answer questions about them, extract text from images, and handle high-resolution inputs.
  • Audio: Qwen2-Audio accepts both audio and text inputs, extending the family into spoken language.
  • All-at-once: Qwen2.5-Omni is a 7-billion-parameter model that handles text, images, audio, and video simultaneously, responding in real time with both text and synthesized speech.

The size ladder — something for everyone

One of Qwen's most practical strengths is its range. The family runs from 0.8 billion parameters (small enough to run on a laptop or edge device) up to 480 billion (requiring serious server hardware). In between sit popular sizes like 4B, 7B, 9B, 14B, 27B, and 35B.

Many of the larger models use a design called Mixture of Experts (MoE): the model has a large total parameter count but only activates a small fraction for each input. For example, the Qwen3.5-35B-A3B model has 35 billion parameters but only uses about 3 billion at a time — giving you much of the quality at a fraction of the compute cost.

Long context: reading entire codebases or documents

A "context window" is how much text a model can read at once. Qwen2.5-1M extended open-weight Qwen models to 1 million tokens — roughly 750,000 words, or an entire large codebase — making it possible to ask questions about or summarize very large documents in a single pass.

Research alongside releases

The Qwen team doesn't just ship models; it publishes the research behind them. Recent work includes GSPO, a new training algorithm designed to prevent AI models from becoming unstable during extended reinforcement learning runs — a known problem that limits how much you can improve a model through this technique. This kind of published research benefits the whole AI community, not just Qwen users.

Recent developments

The Qwen3.5 and Qwen3.6 series (released in early-to-mid 2026) brought multimodal capabilities — image understanding alongside text — to nearly every size in the lineup, from 0.8B to 122B. Qwen3.7-Max, announced in May 2026, is positioned as a frontier model for "agentic" tasks: long, multi-step jobs where the AI takes actions (browsing, writing files, calling tools) with minimal human supervision.

Qwen's competitive standing is also visible in how rivals talk about it: Mistral's Mistral Small 4 explicitly benchmarks itself against Qwen models, treating them as the standard to beat in the open-weights space.

Where it's heading

The trajectory points toward more capable agents (models that can autonomously complete complex tasks), larger and more efficient MoE architectures, and continued expansion of multimodal capabilities. The consistent pattern — release early, release often, publish the research — suggests Qwen will remain a central reference point for anyone building with open AI models.

The Qwen model family at a glance

Timeline

  1. OFA unified multimodal model published — early Qwen-team research

  2. Qwen series retrospective published; Qwen-VL-Plus and Qwen-VL-Max launched

  3. Qwen2-Audio released — audio modality added to the family

  4. QwQ-32B-Preview released — Qwen's first deep-reasoning model

  5. Qwen2.5-1M open-weights models reach 1M-token context window

  6. QwQ-32B and Qwen2.5-Omni released — RL-scaled reasoning and full omni-modal model

  7. Qwen3-Coder-480B released — flagship agentic coding model

  8. Qwen3.7-Max announced — frontier agentic model

Related topics

AlibabaHugging FaceModelScopeMicrosoft AzureMixture of ExpertsQwen2-AudioQwen2.5-Math-PRMMistral AIDeepSeek V4

FAQ

Is Qwen free to use?

Most Qwen models are released as open weights on Hugging Face and ModelScope, meaning you can download and run them yourself. Some are also available via Alibaba's cloud APIs.

What can Qwen models actually do?

Different models in the family handle different tasks: text conversation, writing code, understanding images, processing audio, and running multi-step "agent" tasks like browsing the web or editing files.

How does Qwen compare to ChatGPT or Claude?

Qwen's flagship models claim benchmark results comparable to leading closed models like Claude Sonnet 4 on coding tasks, while being freely downloadable — the key difference is you can run Qwen yourself rather than paying per use.

Do I need a powerful computer to run Qwen?

It depends on the model — Qwen offers sizes from 0.8 billion parameters (runs on modest hardware) all the way to 480 billion (requires a cluster). Most developers start with the 7B or 14B variants.

What is a Mixture-of-Experts model?

It's a design where only a fraction of the model's total parameters are "active" for any given input — for example, Qwen3-Coder-480B activates just 35B of its 480B parameters at a time, making it faster and cheaper to run than its total size suggests.

Stay current

Call Me Almanac pairs the week's AI news with guides like this one — Midweek & Sunday.

Versions

  • v1live5d ago

Related guides (4)

More on Qwen (6)

7Qwen Research·1mo ago·source ↗

GSPO: Group Sequence Policy Optimization for Scalable RL Training of Language Models

Qwen researchers introduce Group Sequence Policy Optimization (GSPO), a new RL algorithm designed to address severe training instability and model collapse observed in existing methods like GRPO during extended training runs. The core motivation is enabling stable RL scaling for language models to improve reasoning and problem-solving capabilities with increased compute. The paper targets a known bottleneck in post-training pipelines where instability prevents further performance gains.

8Qwen Research·1mo ago·source ↗

Qwen3-Coder: 480B MoE Agentic Coding Model Released by Alibaba/Qwen Team

Alibaba's Qwen team has released Qwen3-Coder, a family of code-focused models with the flagship variant being Qwen3-Coder-480B-A35B-Instruct, a 480B-parameter Mixture-of-Experts model with 35B active parameters. It supports 256K native context length and up to 1M tokens via extrapolation. The model claims state-of-the-art results among open-weight models on agentic coding, browser-use, and tool-use benchmarks, with performance described as comparable to Claude Sonnet 4.

7Qwen Research·1mo ago·source ↗

Qwen2.5-Omni: Alibaba Releases End-to-End Multimodal Model with Real-Time Streaming

Alibaba's Qwen team releases Qwen2.5-Omni, a 7B-parameter end-to-end multimodal model capable of processing text, images, audio, and video simultaneously. The model delivers real-time streaming responses in both text and natural speech synthesis. It is openly available on Hugging Face, ModelScope, DashScope, and GitHub, accompanied by a technical paper.

7Qwen Research·1mo ago·source ↗

QwQ-32B: Scaling Reinforcement Learning for Enhanced Reasoning

Alibaba's Qwen team releases QwQ-32B, a 32-billion parameter model trained with scaled Reinforcement Learning to improve reasoning capabilities beyond conventional pretraining and post-training methods. The release draws explicit comparison to DeepSeek R1's cold-start and multi-stage RL training approach. The model is available via Qwen Chat, Hugging Face, ModelScope, and a demo interface. This represents Qwen's exploration of RL scalability as a path to enhanced LLM intelligence.

6Qwen Research·1mo ago·source ↗

Global-batch Load Balancing for MoE LLM Training from Qwen

Qwen Research introduces a global-batch load balancing technique for Mixture-of-Experts (MoE) LLM training, claiming it is nearly a 'free lunch' improvement. The method addresses expert load imbalance across training batches, a known efficiency and quality bottleneck in MoE architectures. The approach targets the router and expert activation dynamics in transformer-based MoE layers.

7Qwen Research·1mo ago·source ↗

QwQ-32B-Preview: Alibaba's Qwen Reasoning Model with Deep Reflection Capabilities

Alibaba's Qwen team has released QwQ-32B-Preview, a 32-billion parameter model designed for deep reasoning across mathematics, code, and general knowledge. The model is positioned as a reasoning-focused system that emphasizes uncertainty and iterative questioning as core design principles. It is available on GitHub, Hugging Face, ModelScope, and via a demo interface.