Entity · model

Qwen2.5

modelactiveqwen2-5-5098933f·14 events·first seen May 18, 2026

Aliases: Qwen2.5, Qwen2

Co-occurring entities

More like this (12)

Qwen 2.5-7B Qwen1.5 Qwen 3.5 Qwen2.5-3B Qwen2.5-Max Qwen 3.7 Qwen3 Qwen2.5-7B Qwen2.5-Coder Qwen2.5-VL Qwen2-Audio Qwen3.5-Plus

Recent events (14)

5arXiv · cs.CL·39h ago·source ↗

KV cache transplantation and precision controls explain stage-replay divergence in LLM inference

A preprint from arXiv investigates why stage-replay diagnostics for LLMs diverge from live inference, finding that KV cache state—not token identity—is the causally sufficient carrier of divergent trajectories in a Qwen2.5-derived system. A 200-item controlled experiment shows BF16 precision produces disagreements on 166 suffixes while FP32 eliminates decoded disagreements entirely. Bidirectional transplantation of all 48 KV layers causes continuations to follow the cache donor with 100% fidelity across two checkpoints (24/24 and 43/43). The findings have direct implications for reproducibility and debugging of multi-step reasoning systems.

Evaluation and Benchmarking Inference Economics Stage-Replay Divergence Follows the KV Cache: Fixed-Prefix Precision Controls and Bidirectional Cache Transplantation Qwen2.5

5arXiv · cs.CL·Jul 24, 2026·source ↗

AdaDSF: Adaptive Depth Sparse Framework for inference-efficient LLMs without retraining

Researchers introduce AdaDSF, a method that converts pre-trained LLMs into depth-sparse models without full retraining by using cosine similarity between layer inputs and outputs to identify redundant computation. A lightweight router selects informative tokens per layer, and a feature-preserving alignment objective maintains output quality. Evaluated on GPT-NeoX and Qwen2.5, AdaDSF reduces inference FLOPs while outperforming baselines including MoD, D-LLM, and DLO under comparable sparsity budgets.

Inference Economics Qwen2.5 GPT-NeoX Adaptive Depth Sparse Framework +1 more

4arXiv · cs.CL·Jul 20, 2026·source ↗

BayesPO: Bayesian posterior sampling framework for discrete prompt optimization

BayesPO reframes prompt optimization as Bayesian posterior sampling over discrete prompt tokens, combining a task likelihood term with a language-model prior to form an energy-based objective. The framework uses a Metropolis-Hastings corrected Gibbs-with-Langevin proposal with parallel tempering to explore rugged LLM energy landscapes without updating model weights. Experiments on Qwen2.5 models show modest accuracy gains on instruction-induction tasks (60.04% to 63.23%), with identified limitations around overfitting and computational cost. The work positions principled probabilistic methods as an alternative to heuristic prompt search procedures.

Evaluation and Benchmarking Agent and Tool Ecosystem Qwen2.5 BayesPO Metropolis-Hastings +1 more

6arXiv · cs.LG·Jul 15, 2026·source ↗

Information-theoretic framework establishes tight sample complexity laws for watermark forensics in generative models

A new arXiv preprint develops an information-theoretic framework for watermark forensics in generative model outputs, organizing detection, attribution, payload extraction, and localization into a 'forensic ladder' with precise sample complexity bounds. The main theorem establishes the first tight entropy-rate law for multi-user attribution: attributing text to one of N users costs Θ(log N/h) tokens under statistically distortion-free schemes, with a matching converse. The paper also identifies two fundamental gaps — a window where text is provably machine-made but unattributable, and a footprint-resolution uncertainty principle — validated experimentally on GPT-2, Pythia-410M, and Qwen2.5.

Evaluation and Benchmarking AI Safety Research Watermark Forensics for Generative Models: An Information-Theoretic Perspective Qwen2.5 GPT-2 +1 more

7arXiv · cs.LG·Jul 2, 2026·source ↗

Single transformer layer training can match full-parameter RL post-training in LLMs

A new arXiv paper challenges the assumption that all transformer layers contribute equally during RL post-training, finding that training a single layer can recover most or all of the gains from full-parameter RL. The authors introduce a 'layer contribution' metric and evaluate across seven models from the Qwen2.5 and Qwen3 families, three RL algorithms (GRPO, GiGPO, Dr. GRPO), and tasks including math reasoning, code, and agentic decision-making. A consistent structural pattern emerges: high-contribution layers concentrate in the middle of the transformer stack, and this ranking is stable across datasets, tasks, and algorithms.

Training Infrastructure Inference Economics Qwen2.5 GRPO Is One Layer Enough? Training A Single Transformer Layer Can Match Full-Parameter RL Training +4 more

5arXiv · cs.AI·Jun 10, 2026·source ↗

CLP: Lightweight collocation-length predictor achieves zero-loss multi-token inference speedup

Researchers propose CLP (Collocation-Length Predictor), a span-level decision layer for accelerating LLM inference via multi-token prediction without quality degradation. The key insight is 'Backbone-as-Architect': the backbone LM head always generates the first token while MTP heads handle only subsequent tokens, eliminating head-backbone competition that causes repetitive outputs in prior methods. CLP uses a single linear layer (~4.6K–7.7K parameters) versus 1M-parameter gate networks in prior work, achieving 1.14x–1.29x speedup on Qwen2.5 models with near-zero repetition ratio. The paper also establishes that shorter prediction horizons improve MTP head accuracy on larger models, offering a scaling-aware design principle.

Inference Economics Qwen2.5 Alibaba CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference +2 more

7arXiv · cs.CL·Jun 1, 2026·source ↗

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

SCOPE is a data-free self-play framework for training language models on open-ended tasks without external supervision or frontier-model judges. It co-evolves two policies—a Challenger that generates document-grounded tasks and a Solver that answers via multi-turn retrieval—using a frozen copy of the initial model as a self-judge that writes task-specific rubrics. Across three 7-8B models (Qwen2.5, Qwen3, OLMo-3), SCOPE achieves up to +10.4 points on eight open-ended benchmarks and +13.8 points on seven held-out short-form QA benchmarks, matching or exceeding GRPO trained on ~9K curated prompts. Ablations identify rubric generation quality as the primary bottleneck for self-judging.

Evaluation and Benchmarking Open Weights Progress SCOPE Qwen2.5 self-play +5 more

7Qwen Research·May 18, 2026·source ↗

Generalizing an LLM from 8k to 1M Context using Qwen-Agent

Alibaba's Qwen team describes an agent built on Qwen2 (8k native context) that processes documents up to 1M tokens by decomposing retrieval and reasoning tasks, reportedly outperforming both RAG pipelines and native long-context models. The agent framework was also used to generate synthetic training data for fine-tuning new long-context Qwen models, creating a self-improvement loop. This positions agent-based context extension as a practical alternative to architectural long-context training.

Long Context Evolution Open Weights Progress RAG Qwen2.5 Alibaba +2 more

8Qwen Research·May 18, 2026·source ↗

Qwen2 Model Family Released: Five Sizes, 128K Context, Multilingual

Alibaba's Qwen team has released Qwen2, an evolution from Qwen1.5, comprising five pretrained and instruction-tuned models ranging from 0.5B to 72B parameters, including a 57B mixture-of-experts variant (57B-A14B). The release highlights training on 27 additional languages beyond English and Chinese, significantly improved coding and mathematics performance, and extended context support up to 128K tokens for the 7B and 72B instruct variants. Benchmark results are claimed to be state-of-the-art across a large number of evaluations.

Long Context Evolution Frontier Model Releases Qwen2-72B Qwen2.5 Qwen2-57B-A14B +4 more

6Qwen Research·May 18, 2026·source ↗

Introducing Qwen2-Math: Math-Specialized LLMs from Alibaba's Qwen Team

Alibaba's Qwen team has released Qwen2-Math and Qwen2-Math-Instruct, a series of math-specialized large language models built on the Qwen2 architecture. The models are designed to enhance arithmetic and mathematical reasoning capabilities in LLMs. The initial release supports English only, with bilingual English/Chinese versions announced as forthcoming.

Frontier Model Releases Evaluation and Benchmarking Qwen2-Math-Instruct Qwen2.5 Alibaba +2 more

7Qwen Research·May 18, 2026·source ↗

Qwen2-VL: Alibaba Releases Latest Vision-Language Model with Extended Video Understanding

Alibaba's Qwen team has released Qwen2-VL, the latest iteration of their vision-language model series built on the Qwen2 foundation. The model claims state-of-the-art performance on visual understanding benchmarks including MathVista, DocVQA, RealWorldQA, and MTVQA. A notable capability is understanding videos exceeding 20 minutes in length for question answering, dialog, and content creation tasks.

Frontier Model Releases Evaluation and Benchmarking Qwen2.5-VL RealWorldQA DocVQA +6 more

8Qwen Research·May 18, 2026·source ↗

Qwen2.5-LLM: Alibaba releases open-weight language models from 0.5B to 72B

Alibaba's Qwen team releases the Qwen2.5 series of decoder-only dense language models, open-sourcing seven variants spanning 0.5B to 72B parameters. The release targets production use cases in the 10-30B range and mobile deployments at 3B scale. This represents a significant expansion of the open-weights frontier from a Tier 1 Chinese AI lab.

Frontier Model Releases Open Weights Progress Qwen2.5 Alibaba Qwen Team +4 more

8Qwen Research·May 18, 2026·source ↗

Qwen2.5: Large-Scale Open-Source Foundation Model Family Release

Alibaba's Qwen team has released Qwen2.5, described as potentially the largest open-source model release in history, following three months of development after Qwen2. The release encompasses a family of foundation models with improvements in knowledge and reasoning capabilities. The announcement targets developers who have been building on Qwen2 and incorporates feedback from that community.

Frontier Model Releases Open Weights Progress Qwen2.5 Alibaba Hugging Face +2 more

7Qwen Research·May 18, 2026·source ↗

Qwen2.5-Turbo Extends Context Length to 1M Tokens

Alibaba's Qwen team has released Qwen2.5-Turbo, extending the model's context window from 128K to 1 million tokens (approximately 1 million English words). The update includes optimizations for both model capabilities and inference performance at extreme context lengths. The model is available via API and through HuggingFace and ModelScope demos.

Long Context Evolution Frontier Model Releases Qwen2.5 Alibaba ModelScope +3 more