7arXiv cs.CL (Computation and Language)·40h ago

Qwen-AgentWorld: Language world models for general agent simulation and planning

Alibaba's Qwen team introduces Qwen-AgentWorld, a pair of language world models (35B-A3B and 397B-A17B) trained to simulate agentic environments across 7 domains using over 10M interaction trajectories. The models are trained via a three-stage pipeline (CPT, SFT, RL) and evaluated on AgentWorldBench, a new benchmark constructed from 5 frontier models across 9 established benchmarks. Beyond simulation, the work demonstrates two downstream use cases: using the world model as a decoupled RL training environment and as a warm-up for agent foundation models, both yielding gains over baselines.

Frontier Model Releases Evaluation and Benchmarking Agent and Tool Ecosystem AgentWorldBench Qwen-AgentWorld-35B-A3B Alibaba Qwen-AgentWorld: Language World Models for General Agents QwenLM

Related guides (3)

Frontier Model ReleasesTopic guide

Frontier Model Releases: The Race From Language to Action

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

6Qwen·40h ago·source ↗

Qwen releases AgentWorld-35B-A3B: a world-model and environment-simulation MoE for agents

Qwen has released Qwen-AgentWorld-35B-A3B on Hugging Face, a 35B-parameter MoE model (3B active) built on the Qwen3.5 MoE architecture. The model is tagged for world-model and environment-simulation use cases, suggesting it is designed to simulate environments for agent training or evaluation. It is paired with a dataset called AgentWorldBench, indicating an associated evaluation suite. Early engagement is minimal (0 downloads, 4 likes) but the model represents a notable direction in agent-environment modeling from a major open-weights lab.

Open Weights Progress Agent and Tool Ecosystem AgentWorldBench Qwen-AgentWorld-35B-A3B Qwen +1 more

7Hacker News·1mo ago·source ↗

Qwen3.7-Max: The Agent Frontier

Alibaba's Qwen team has announced Qwen3.7-Max, positioned as a frontier model for agentic tasks. The announcement appears on the official Qwen blog and generated significant community discussion on Hacker News with 559 points and 217 comments. The model name suggests it is part of the Qwen 3 generation, with a focus on agent capabilities.

Frontier Model Releases Open Weights Progress Alibaba Qwen Qwen2.5-Max +1 more

8Qwen Research·1mo ago·source ↗

Qwen2 Model Family Released: Five Sizes, 128K Context, Multilingual

Alibaba's Qwen team has released Qwen2, an evolution from Qwen1.5, comprising five pretrained and instruction-tuned models ranging from 0.5B to 72B parameters, including a 57B mixture-of-experts variant (57B-A14B). The release highlights training on 27 additional languages beyond English and Chinese, significantly improved coding and mathematics performance, and extended context support up to 128K tokens for the 7B and 72B instruct variants. Benchmark results are claimed to be state-of-the-art across a large number of evaluations.

Long Context Evolution Frontier Model Releases Qwen2-72B Qwen2.5 Qwen2-57B-A14B +4 more

8Qwen Research·1mo ago·source ↗

Qwen2.5-LLM: Alibaba releases open-weight language models from 0.5B to 72B

Alibaba's Qwen team releases the Qwen2.5 series of decoder-only dense language models, open-sourcing seven variants spanning 0.5B to 72B parameters. The release targets production use cases in the 10-30B range and mobile deployments at 3B scale. This represents a significant expansion of the open-weights frontier from a Tier 1 Chinese AI lab.

Frontier Model Releases Open Weights Progress Qwen2.5 Alibaba Qwen Team +4 more

7Qwen Research·1mo ago·source ↗

Generalizing an LLM from 8k to 1M Context using Qwen-Agent

Alibaba's Qwen team describes an agent built on Qwen2 (8k native context) that processes documents up to 1M tokens by decomposing retrieval and reasoning tasks, reportedly outperforming both RAG pipelines and native long-context models. The agent framework was also used to generate synthetic training data for fine-tuning new long-context Qwen models, creating a self-improvement loop. This positions agent-based context extension as a practical alternative to architectural long-context training.

Long Context Evolution Open Weights Progress RAG Qwen2.5 Alibaba +2 more

7arXiv · cs.CL·26d ago·source ↗

Qwen-VLA: Unified Vision-Language-Action Model Across Robot Tasks, Environments, and Embodiments

Alibaba's Qwen team presents Qwen-VLA, a unified embodied foundation model that extends the Qwen vision-language stack to continuous action and trajectory generation via a DiT-based action decoder. The model is jointly pretrained on diverse data spanning manipulation trajectories, egocentric demonstrations, synthetic simulation, and navigation data, with embodiment-aware prompt conditioning to support multiple robot platforms. A unified action-and-trajectory prediction framework covers manipulation, navigation, and trajectory prediction tasks. Benchmarks show strong results: 97.9% on LIBERO, 73.7% on Simpler-WidowX, 69.0% OSR on R2R navigation, and 76.9% average OOD success in real-world ALOHA experiments.

Frontier Model Releases Evaluation and Benchmarking Qwen-VLA DOMINO R2R +10 more

7Qwen Research·1mo ago·source ↗

Qwen2.5-Max: Large-Scale MoE Model Release by Alibaba's Qwen Team

Alibaba's Qwen team announces Qwen2.5-Max, a large-scale Mixture-of-Experts language model. The post acknowledges that scaling insights for very large MoE models have been limited, citing DeepSeek V3's recent disclosures as a reference point. The model is positioned as a frontier-scale MoE system developed concurrently with ongoing Qwen2 research.

Training Infrastructure Frontier Model Releases DeepSeek V4 Alibaba Qwen Team +3 more

7Qwen Research·1mo ago·source ↗

Qwen2.5-VL-32B: Reinforcement-Learning-Optimized Vision-Language Model Released

Alibaba's Qwen team has released Qwen2.5-VL-32B-Instruct, a 32-billion-parameter vision-language model built on the Qwen2.5-VL series and further optimized with reinforcement learning. The model is open-sourced under the Apache 2.0 license and available on Hugging Face and ModelScope. It follows the January 2025 launch of the broader Qwen2.5-VL series, positioning the 32B scale as a balance between capability and deployment practicality.

Open Weights Progress Inference Economics Qwen2.5-VL Qwen2.5-VL-32B-Instruct Apache 2.0 +5 more