Almanac
Guide · In-depth

Qwen: Alibaba's Open-Weight AI Lab Reshaping the Frontier

QwenIn-depthactive·v1 · live·generated 5d ago
TL;DRQwen is Alibaba's AI research team and model family, which has grown from a single open-source LLM into one of the most prolific open-weight labs in the world. Its strategy combines relentless release cadence across scales and modalities with genuine frontier ambition — pushing MoE architectures, RL-based reasoning, and agentic coding to levels that compete directly with closed frontier labs. The breadth of community adoption on Hugging Face and the depth of its post-training research signal that Qwen has become a structural fixture of the open-weights ecosystem.

Key takeaways

  • Qwen3-Coder-480B-A35B-Instruct, the flagship coding model, claims state-of-the-art open-weight results on agentic coding and tool-use benchmarks, with performance described as comparable to Claude Sonnet 4.
  • The Qwen3.5 series spans 0.8B to 122B parameters with both dense and MoE variants, all supporting image-text-to-text; the 4B instruct model alone has exceeded 10 million Hugging Face downloads.
  • QwQ-32B applies scaled reinforcement learning to reasoning, explicitly drawing comparison to DeepSeek R1's multi-stage RL training approach.
  • Qwen researchers published GSPO (Group Sequence Policy Optimization) to address training instability and model collapse in extended RL runs — a known bottleneck that limits how far post-training can scale.
  • The Qwen2.5-Omni 7B model processes text, images, audio, and video simultaneously with real-time streaming output, making it one of the most modality-complete small open models available.
  • Qwen's research output extends beyond model releases into MoE training infrastructure (global-batch load balancing), reward modeling (Skill-RM), and RL data engineering (SAERL), indicating a vertically integrated research program.

What Qwen is

Qwen is the AI research team and model family operated by Alibaba. It began as a single open-source LLM release and has since grown into one of the most prolific open-weight AI programs in the world, spanning language, vision, audio, video, code, and reasoning — across parameter scales from 0.8B to 480B. The team publishes both model weights and the underlying research, making it a significant contributor to the broader open-weights ecosystem as well as a direct competitor to frontier closed labs.

Origins and early trajectory

Qwen's lineage traces to Alibaba's multimodal pretraining work: OFA (One-For-All), a unified model for understanding and generation across modalities, appeared in late 2022, followed by OFASys, a training framework designed to reduce the engineering overhead of multi-task, multi-modal pipelines. The Qwen-7B open-source LLM launched roughly a year later, and a January 2024 retrospective post consolidated the team's public positioning. Early multimodal extensions — Qwen-VL-Plus and Qwen-VL-Max — added high-definition image support (exceeding one million pixels) and substantially improved visual reasoning. CodeQwen1.5 followed in April 2024 as an explicit open-source alternative to proprietary coding assistants, citing cost, privacy, and copyright concerns.

The model family: architecture and scale

Qwen's current portfolio is organized around several overlapping axes:

Dense vs. MoE. Qwen deploys both dense transformers and Mixture-of-Experts architectures. MoE models activate only a fraction of total parameters per token (e.g. 35B active out of 480B total in Qwen3-Coder, or ~3B active out of 35B in Qwen3.5-35B-A3B), enabling large-model capacity at reduced inference cost. Qwen researchers have published infrastructure work to support this — including a global-batch load balancing technique for MoE training that addresses expert activation imbalance.

Scale ladder. The Qwen3.5 generation spans 0.8B, 2B, 4B, 9B, 27B, 35B (MoE), and 122B (MoE) — a deliberate ladder covering edge deployment through datacenter inference. Community uptake is substantial: the 4B instruct model has exceeded 10 million Hugging Face downloads; the 35B MoE variant has over 2.8 million; the 0.8B model over 2.7 million despite its sub-1B scale.

Multimodal coverage. Most Qwen3.5 and Qwen3.6 models are image-text-to-text, supporting both conversational and Azure endpoint deployment. Qwen2.5-Omni extends this to a 7B model that simultaneously processes text, images, audio, and video with real-time streaming output in both text and natural speech — one of the most modality-complete small open models in the bundle.

Long context. Qwen2.5-1M released open-weight 7B and 14B models with 1M-token context windows in January 2025, following an earlier proprietary Qwen2.5-Turbo upgrade. Qwen3-Coder supports 256K natively and up to 1M via extrapolation.

Reasoning and RL post-training

A distinct thread in Qwen's work is the application of reinforcement learning to improve reasoning beyond what pretraining and standard RLHF achieve. QwQ-32B-Preview (November 2024) introduced a reasoning-focused model emphasizing uncertainty and iterative self-questioning. The full QwQ-32B (March 2025) applied scaled RL training, explicitly drawing comparison to DeepSeek R1's cold-start and multi-stage RL approach.

The team has also published foundational RL research: GSPO (Group Sequence Policy Optimization) addresses the training instability and model collapse observed in methods like GRPO during extended RL runs — a bottleneck that limits how far post-training compute can be pushed. Complementary work includes SAERL, which uses Sparse Autoencoders to guide RL fine-tuning data engineering (achieving 3% accuracy gains and 20% fewer training steps on Qwen2.5-Math-1.5B with GRPO), and Skill-RM, a reward modeling framework that treats evaluation as a reusable agentic skill rather than a static judge.

The Qwen2.5-Math Process Reward Model supervises intermediate reasoning steps rather than only final answers — addressing the failure mode where models produce plausible but flawed derivations while reaching correct conclusions.

Agentic coding: Qwen3-Coder

The flagship agentic release is Qwen3-Coder-480B-A35B-Instruct (July 2025), a 480B MoE model with 35B active parameters and 256K native context. The team claims state-of-the-art results among open-weight models on agentic coding, browser-use, and tool-use benchmarks, with performance described as comparable to Claude Sonnet 4. This positions Qwen3-Coder as the open-weight answer to closed frontier coding agents. The Qwen3.7-Max model (May 2026) extends the frontier agentic positioning into the Qwen 3 generation more broadly.

Evaluation infrastructure

Qwen has begun building evaluation tooling alongside its models. Qwen-Image-Bench (May 2026) is a bilingual (English/Chinese) judge model for evaluating text-to-image outputs — a signal that the team is investing in the measurement layer, not just the model layer.

Ecosystem and deployment footprint

Qwen models are distributed via Hugging Face, ModelScope, DashScope, and GitHub, with Azure endpoint compatibility across the Qwen3.5 and Qwen3.6 families. Third-party inference frameworks — vLLM, llama.cpp, SGLang, Transformers — support the weights. Open Interpreter, a Python coding agent framework with nearly 64,000 GitHub stars, lists Qwen among its supported open models alongside DeepSeek and Kimi. Mistral's own competitive analysis (Mistral Small 4) names Qwen models as a benchmark comparison target, confirming Qwen's standing as a reference point for open-weight model evaluation.

Where it's heading

The trajectory across the events bundle points in three directions simultaneously: (1) continued MoE scaling toward frontier capability, with Qwen3-Coder and Qwen3.7-Max as the current leading edge; (2) deeper RL post-training infrastructure, with GSPO and related work addressing the stability bottlenecks that constrain how much reasoning can be extracted from a given model; and (3) broader modality coverage, with omni-modal and visual reasoning models filling out the capability surface. The combination of high-volume open-weight releases, vertically integrated research, and deep community adoption makes Qwen one of the defining forces in the open-weights frontier.

Qwen model family: capability axes

Selected Qwen model variants across generations

ModelParams (total / active)ModalitiesNotable capability
Qwen3-Coder-480B-A35B480B / 35B (MoE)Text, codeSOTA open-weight agentic coding; 256K native / 1M extrapolated context
Qwen3.7-MaxTextFrontier agentic tasks (Qwen 3 generation)
Qwen3.5-122B-A10B122B / 10B (MoE)Image + textMultimodal MoE; Azure-compatible; 840K HF downloads
Qwen3.5-35B-A3B35B / 3B (MoE)Image + text2.8M+ HF downloads; 1,400+ likes
Qwen3.5-27B27B (dense)Image + text~3M HF downloads
Qwen2.5-Omni7B (dense)Text, image, audio, videoReal-time streaming text + speech output
QwQ-32B32B (dense)TextScaled RL reasoning; DeepSeek R1-comparable training approach
QVQ-72B-Preview72B (dense)Image + textVisual reasoning extension of QwQ line

Cells marked — indicate the events bundle does not disclose the value.

Timeline

  1. OFA unified multimodal pretrained model introduced — early foundation of Qwen's multimodal research

  2. OFASys multitask multimodal training framework released

  3. Qwen series retrospective published; Qwen-VL-Plus and Qwen-VL-Max launched with HD image support

  4. CodeQwen1.5 released as open-source coding LLM alternative to proprietary assistants

  5. Qwen2-Audio released — audio + text input, text output

  6. QwQ-32B-Preview released — deep reasoning with uncertainty and iterative questioning

  7. QVQ-72B-Preview released — visual reasoning at 72B scale

  8. Qwen2.5-1M open-weight models released (7B and 14B) with 1M-token context

  9. QwQ-32B (full release) and Qwen2.5-Omni ship; scaled RL reasoning and omni-modal streaming established

  10. Qwen3-Coder-480B-A35B released; GSPO RL stability paper published

  11. Qwen3.5 series launches across 0.8B–122B with dense and MoE multimodal variants

  12. Qwen3.7-Max announced as frontier agentic model; Qwen-Image-Bench evaluation model released

Related topics

AlibabaHugging FaceModelScopeMicrosoft AzureMixture of ExpertsQwen2-AudioQwen2.5-Math-PRMMistral AIDeepSeek V4

FAQ

Is Qwen open-source or proprietary?

Qwen releases open-weight models on Hugging Face, ModelScope, and GitHub, though some proprietary API variants (e.g. Qwen2.5-Turbo) exist alongside them.

How does Qwen's MoE strategy work?

Qwen's MoE models (e.g. 480B total / 35B active, or 35B total / 3B active) activate only a fraction of parameters per forward pass, enabling large-model capacity at lower inference cost — a pattern Qwen applies across both coding and multimodal families.

What distinguishes QwQ from the main Qwen line?

QwQ models are trained with scaled reinforcement learning specifically to improve multi-step reasoning, emphasizing uncertainty and iterative self-questioning rather than direct answer generation.

Where can Qwen models be deployed?

Models are available via Hugging Face, ModelScope, DashScope, GitHub, and are compatible with Azure deployment endpoints; inference frameworks including vLLM, llama.cpp, SGLang, and Transformers are supported.

How does Qwen3-Coder compare to closed models?

The Qwen team describes Qwen3-Coder-480B-A35B-Instruct as achieving performance comparable to Claude Sonnet 4 on agentic coding, browser-use, and tool-use benchmarks among open-weight models.

What is GSPO and why does it matter?

GSPO (Group Sequence Policy Optimization) is a Qwen-developed RL algorithm that addresses training instability and model collapse seen in methods like GRPO during extended runs, enabling further scaling of post-training compute for reasoning improvements.

Stay current

Call Me Almanac pairs the week's AI news with guides like this one — Midweek & Sunday.

Versions

  • v1live5d ago

Related guides (4)

More on Qwen (6)

7Qwen Research·1mo ago·source ↗

GSPO: Group Sequence Policy Optimization for Scalable RL Training of Language Models

Qwen researchers introduce Group Sequence Policy Optimization (GSPO), a new RL algorithm designed to address severe training instability and model collapse observed in existing methods like GRPO during extended training runs. The core motivation is enabling stable RL scaling for language models to improve reasoning and problem-solving capabilities with increased compute. The paper targets a known bottleneck in post-training pipelines where instability prevents further performance gains.

8Qwen Research·1mo ago·source ↗

Qwen3-Coder: 480B MoE Agentic Coding Model Released by Alibaba/Qwen Team

Alibaba's Qwen team has released Qwen3-Coder, a family of code-focused models with the flagship variant being Qwen3-Coder-480B-A35B-Instruct, a 480B-parameter Mixture-of-Experts model with 35B active parameters. It supports 256K native context length and up to 1M tokens via extrapolation. The model claims state-of-the-art results among open-weight models on agentic coding, browser-use, and tool-use benchmarks, with performance described as comparable to Claude Sonnet 4.

7Qwen Research·1mo ago·source ↗

Qwen2.5-Omni: Alibaba Releases End-to-End Multimodal Model with Real-Time Streaming

Alibaba's Qwen team releases Qwen2.5-Omni, a 7B-parameter end-to-end multimodal model capable of processing text, images, audio, and video simultaneously. The model delivers real-time streaming responses in both text and natural speech synthesis. It is openly available on Hugging Face, ModelScope, DashScope, and GitHub, accompanied by a technical paper.

7Qwen Research·1mo ago·source ↗

QwQ-32B: Scaling Reinforcement Learning for Enhanced Reasoning

Alibaba's Qwen team releases QwQ-32B, a 32-billion parameter model trained with scaled Reinforcement Learning to improve reasoning capabilities beyond conventional pretraining and post-training methods. The release draws explicit comparison to DeepSeek R1's cold-start and multi-stage RL training approach. The model is available via Qwen Chat, Hugging Face, ModelScope, and a demo interface. This represents Qwen's exploration of RL scalability as a path to enhanced LLM intelligence.

6Qwen Research·1mo ago·source ↗

Global-batch Load Balancing for MoE LLM Training from Qwen

Qwen Research introduces a global-batch load balancing technique for Mixture-of-Experts (MoE) LLM training, claiming it is nearly a 'free lunch' improvement. The method addresses expert load imbalance across training batches, a known efficiency and quality bottleneck in MoE architectures. The approach targets the router and expert activation dynamics in transformer-based MoE layers.

7Qwen Research·1mo ago·source ↗

QwQ-32B-Preview: Alibaba's Qwen Reasoning Model with Deep Reflection Capabilities

Alibaba's Qwen team has released QwQ-32B-Preview, a 32-billion parameter model designed for deep reasoning across mathematics, code, and general knowledge. The model is positioned as a reasoning-focused system that emphasizes uncertainty and iterative questioning as core design principles. It is available on GitHub, Hugging Face, ModelScope, and via a demo interface.