Almanac
← Events
7arXiv cs.CL (Computation and Language)·40h ago

Qwen-AgentWorld: Language world models for general agent simulation and planning

Alibaba's Qwen team introduces Qwen-AgentWorld, a pair of language world models (35B-A3B and 397B-A17B) trained to simulate agentic environments across 7 domains using over 10M interaction trajectories. The models are trained via a three-stage pipeline (CPT, SFT, RL) and evaluated on AgentWorldBench, a new benchmark constructed from 5 frontier models across 9 established benchmarks. Beyond simulation, the work demonstrates two downstream use cases: using the world model as a decoupled RL training environment and as a warm-up for agent foundation models, both yielding gains over baselines.

Related guides (3)

Related events (8)

6Qwen·40h ago·source ↗

Qwen releases AgentWorld-35B-A3B: a world-model and environment-simulation MoE for agents

Qwen has released Qwen-AgentWorld-35B-A3B on Hugging Face, a 35B-parameter MoE model (3B active) built on the Qwen3.5 MoE architecture. The model is tagged for world-model and environment-simulation use cases, suggesting it is designed to simulate environments for agent training or evaluation. It is paired with a dataset called AgentWorldBench, indicating an associated evaluation suite. Early engagement is minimal (0 downloads, 4 likes) but the model represents a notable direction in agent-environment modeling from a major open-weights lab.

7Hacker News·1mo ago·source ↗

Qwen3.7-Max: The Agent Frontier

Alibaba's Qwen team has announced Qwen3.7-Max, positioned as a frontier model for agentic tasks. The announcement appears on the official Qwen blog and generated significant community discussion on Hacker News with 559 points and 217 comments. The model name suggests it is part of the Qwen 3 generation, with a focus on agent capabilities.

8Qwen Research·1mo ago·source ↗

Qwen2 Model Family Released: Five Sizes, 128K Context, Multilingual

Alibaba's Qwen team has released Qwen2, an evolution from Qwen1.5, comprising five pretrained and instruction-tuned models ranging from 0.5B to 72B parameters, including a 57B mixture-of-experts variant (57B-A14B). The release highlights training on 27 additional languages beyond English and Chinese, significantly improved coding and mathematics performance, and extended context support up to 128K tokens for the 7B and 72B instruct variants. Benchmark results are claimed to be state-of-the-art across a large number of evaluations.

8Qwen Research·1mo ago·source ↗

Qwen2.5-LLM: Alibaba releases open-weight language models from 0.5B to 72B

Alibaba's Qwen team releases the Qwen2.5 series of decoder-only dense language models, open-sourcing seven variants spanning 0.5B to 72B parameters. The release targets production use cases in the 10-30B range and mobile deployments at 3B scale. This represents a significant expansion of the open-weights frontier from a Tier 1 Chinese AI lab.

7Qwen Research·1mo ago·source ↗

Generalizing an LLM from 8k to 1M Context using Qwen-Agent

Alibaba's Qwen team describes an agent built on Qwen2 (8k native context) that processes documents up to 1M tokens by decomposing retrieval and reasoning tasks, reportedly outperforming both RAG pipelines and native long-context models. The agent framework was also used to generate synthetic training data for fine-tuning new long-context Qwen models, creating a self-improvement loop. This positions agent-based context extension as a practical alternative to architectural long-context training.

7arXiv · cs.CL·26d ago·source ↗

Qwen-VLA: Unified Vision-Language-Action Model Across Robot Tasks, Environments, and Embodiments

Alibaba's Qwen team presents Qwen-VLA, a unified embodied foundation model that extends the Qwen vision-language stack to continuous action and trajectory generation via a DiT-based action decoder. The model is jointly pretrained on diverse data spanning manipulation trajectories, egocentric demonstrations, synthetic simulation, and navigation data, with embodiment-aware prompt conditioning to support multiple robot platforms. A unified action-and-trajectory prediction framework covers manipulation, navigation, and trajectory prediction tasks. Benchmarks show strong results: 97.9% on LIBERO, 73.7% on Simpler-WidowX, 69.0% OSR on R2R navigation, and 76.9% average OOD success in real-world ALOHA experiments.

7Qwen Research·1mo ago·source ↗

Qwen2.5-Max: Large-Scale MoE Model Release by Alibaba's Qwen Team

Alibaba's Qwen team announces Qwen2.5-Max, a large-scale Mixture-of-Experts language model. The post acknowledges that scaling insights for very large MoE models have been limited, citing DeepSeek V3's recent disclosures as a reference point. The model is positioned as a frontier-scale MoE system developed concurrently with ongoing Qwen2 research.

7Qwen Research·1mo ago·source ↗

Qwen2.5-VL-32B: Reinforcement-Learning-Optimized Vision-Language Model Released

Alibaba's Qwen team has released Qwen2.5-VL-32B-Instruct, a 32-billion-parameter vision-language model built on the Qwen2.5-VL series and further optimized with reinforcement learning. The model is open-sourced under the Apache 2.0 license and available on Hugging Face and ModelScope. It follows the January 2025 launch of the broader Qwen2.5-VL series, positioning the 32B scale as a balance between capability and deployment practicality.