Entity · company

Qwen

companyactiveqwen-b28afe34·61 events·first seen May 18, 2026

Aliases: Qwen

Co-occurring entities

More like this (12)

Qwen3 Qwen Team Qwen-Image Qwen Chat Qwen 3.7 Qwen 3.5 Qwen1.5 Qwen VLo Qwen2.5 Qwen-VL Qwen 2.5-7B Qwen API

Guides (1)

Qwen

Qwen: Alibaba's Open-Weights AI Lab Competing at the Frontier

Read asBeginner In-depth

Recent events (50)

All 61 events →

6arXiv · cs.CL·5d ago·source ↗

Skill Self-Play: Co-evolving LLM capabilities via structured self-play with dynamic skill routing

Researchers introduce Skill Self-Play (Skill-SP), a reinforcement learning framework that addresses the diversity-vs-verifiability dilemma in LLM self-evolution by using agent skills as a middle ground. The system comprises a proposer, solver, and dynamic skill controller that co-evolve in a continuous loop: the proposer generates tasks conditioned on sampled skills, the solver explores solutions, and the skill controller updates an expanding skill library based on execution feedback. Evaluations on tool-use and reasoning benchmarks show consistent performance gains on capable backbones and recovery for initially misaligned models. Code is released under the Qwen-Applications GitHub organization, suggesting Alibaba/Qwen team involvement.

Frontier Model Releases Agent and Tool Ecosystem Skill Self-Play: Pushing the Frontier of LLM Capability with Co-Evolving Skills Alibaba Qwen +2 more

5arXiv · cs.CL·Jul 23, 2026·source ↗

Theoretical and empirical analysis of long-term temporal portability of LoRA patches across continual pretraining updates

This arXiv paper investigates PortLLM, a training-free and data-free scheme for adapting LLMs after continual pretraining, extending prior short-term results to 10 continual pretraining steps across Mistral, Gemma, and Qwen base models. The authors find that LoRA patches remain portable across longer update horizons, suggesting repeated fine-tuning is unnecessary when base models are periodically updated. Two theoretical analyses are offered, identifying near-orthogonality of high-dimensional vectors as the geometric mechanism underlying temporal portability. The work has practical implications for reducing fine-tuning overhead in production deployments with frequently updated base models.

Inference Economics Enterprise Deployment Patterns Gemma LoRA PortLLM +2 more

4arXiv · cs.CL·Jul 22, 2026·source ↗

DAIS: Dependency-aware intermediate QA supervision improves complex reasoning in LLMs

Researchers introduce DAIS (Dependency-Aware Intermediate QA Supervision), a training-time framework that converts teacher rationales into stage-level QA records where each intermediate step is conditioned on prior reasoning states. Evaluated on GDPR, AIACT, MedQA, and FOLIO benchmarks using Qwen backbones, DAIS outperforms answer-only, flat chain-of-thought, and independent-QA baselines, with up to 5.6% and average 4.2% gains on policy-compliance tasks. Ablations confirm that dependency conditioning contributes beyond simply adding more intermediate text, suggesting it as a lightweight auxiliary supervision signal.

Evaluation and Benchmarking Alignment and RLHF DAIS MedQA Qwen +2 more

4arXiv · cs.CL·Jul 20, 2026·source ↗

MLIR-based compilation method for LLM inference on specialized hardware

Researchers present an MLIR-based compiler pipeline for deploying large language models on AI accelerators, using two dialect layers (TopOp for framework-agnostic graph representation and TpuOp for hardware-specific lowering). The method splits each Transformer layer into three static compilation stages (prefill, prefill_kv, decode) to handle the distinct computational profiles of prompt processing and autoregressive generation. The approach is implemented in the open-source TPU-MLIR compiler and LLM-TPU project, supporting Qwen, Llama, InternVL, and MiniCPM-V families with GPTQ, AWQ, and AutoRound quantization.

Training Infrastructure Inference Economics Sophgo AWQ Qwen +5 more

4arXiv · cs.CL·Jul 20, 2026·source ↗

ToolSciVer: Tool-augmented reinforcement learning for multimodal scientific claim verification

Researchers introduce ToolSciVer, a framework that equips vision-language models with three type-aware visual tools (table focus, chart-to-structure parsing, high-resolution zoom) to verify scientific claims grounded in figures, tables, and charts from papers. The policy is trained using Group Relative Policy Optimization (GRPO) with a composite reward covering correctness, format, tool-use efficiency, and validity. Experiments across five VLMs from three model families (Qwen, InternVL, Gemma) on SciVer and MuSciClaims benchmarks show improvements over prompting-based and RL-based baselines. The work is notable as the first tool-augmented framework specifically targeting multimodal scientific claim verification.

Evaluation and Benchmarking Agent and Tool Ecosystem InternVL MuSciClaims Gemma +5 more

6The Batch·Jul 16, 2026·source ↗

Data Points: PrismML fits 27B model on iPhone; Cognition SWE-1.7, Nvidia Audex, Anthropic language-value study

A newsletter digest covers four notable AI developments: PrismML (a Caltech/Khosla spinout) compressed Alibaba's Qwen 27B model to under 4 GB via ternary/binary quantization for on-device iPhone inference; Cognition released SWE-1.7 (trained on Kimi K2.7), jumping from 9.4% to 42.3% on FrontierCode 1.1 Main with novel RL and infrastructure techniques; Nvidia introduced Audex, a 30B unified audio-text transformer trained on 157B audio tokens; and Anthropic published research showing Claude's expressed values shift measurably by language across 309,815 conversations. Each item represents a distinct technical development across on-device inference, coding agents, multimodal models, and model behavior analysis.

Inference Economics Agent and Tool Ecosystem Kimi K2 Claude Sonnet Claude Opus 4.6 +18 more

6arXiv · cs.CL·Jul 15, 2026·source ↗

One-Word Census: Answer-choice conformity measured across 44 language models

Researchers introduce the One-Word Census, a minimal 31-prompt instrument that probes which one-word answers language models select from open-ended categories, applied to 44 models. Convergence is extreme — 41% of models chose 'serendipity' when asked to pick any word — yet conformity varies fourfold across models in structured ways: persona- and community-tuned models diverge most, while newest mainline flagships conform most. Within four model lineages (Claude, GPT, Qwen, Grok), conformity rises with each generation but reverses for the latest Claude and GPT flagships, suggesting possible repositioning. The field is more lexically concentrated than human norms in 18 of 20 shared categories.

Frontier Model Releases Evaluation and Benchmarking Grok Claude Qwen +4 more

6arXiv · cs.LG·Jul 9, 2026·source ↗

Analysis-driven transformer linearization outperforms prior baselines on LLaMA and Qwen up to 32B

A new arXiv paper analyzes why post-hoc linearization of causal self-attention degrades model quality, identifying key-dependent rank-1 orthogonal projections as the mechanism softmax relies on and explaining why delta-style networks outperform gated accumulation. The authors introduce structural interventions—sink tokens, short convolutions, and fixed-budget cache routing—applied in a frozen-backbone regime. Scaling across LLaMA and Qwen models up to 32B parameters, the approach outperforms prior post-hoc linearization baselines on MMLU and matches long-context retrieval of adaptive-caching frameworks.

Long Context Evolution Inference Economics Qwen The Key to Going Linear: Analysis-Driven Transformer Linearization Llama +1 more

6The Batch·Jul 8, 2026·source ↗

The Batch digest: China bans anthropomorphic bots, DiffusionGemma, Anthropic Claude Code study, Seedance 2.5, Code Arena

A multi-story digest covers five distinct AI developments: ByteDance and Alibaba are shutting down customizable humanlike AI agents ahead of China's July 15 Interim Measures for AI-Based Anthropomorphic Interactive Services; Google released DiffusionGemma, an experimental 26B MoE diffusion-based text model generating 256-token blocks at 1,000+ tokens/sec on H100; Anthropic published findings from 400,000 Claude Code sessions showing domain expertise—not coding skill—drives agentic output volume; Seedance released version 2.5 of its video generator with higher resolution and longer clips; and Arena.ai expanded Code Arena to fullstack web development evaluation. The China regulatory action is the most significant item, representing a concrete enforcement moment for AI persona/companion regulation.

Frontier Model Releases Evaluation and Benchmarking Seedance 2.0 Doubao DiffusionGemma +13 more

6arXiv · cs.CL·Jul 1, 2026·source ↗

Signed-Permutation Gauge Theory for RMSNorm Transformers Improves Coordinate Transport

A new arXiv preprint formalizes the residual-stream gauge symmetry of transformer architectures, showing that RMSNorm models have a signed-permutation gauge group B_d = S_d ⋉ {±1}^d rather than the permutation-only S_d of LayerNorm models. The authors introduce sign-marginalized Hungarian matching and demonstrate that coordinate-preserving transport along fine-tuning trajectories recovers 91.1% of cross-run coordinates versus 60.3% for endpoint matching. Practical consequences include dramatically improved sparse autoencoder reconstruction (NMSE 0.004 vs 1.08), preserved steering vector effects, and correct AdamW optimizer state transfer — with implications for mechanistic interpretability, model merging, and activation engineering.

Evaluation and Benchmarking AI Safety Research AdamW TinyLlama Qwen +2 more

6arXiv · cs.LG·Jul 1, 2026·source ↗

Surrogate Fidelity: Open LLMs often cannot reliably explain closed model behavior

A new arXiv paper from Facebook Research evaluates whether mechanistic interpretability findings from open-weight models transfer to closed API-only models across prediction, attribution, and representation levels. Studying eleven models across four families (Llama, Qwen, GPT, Gemini), the authors find that prediction-level agreement substantially overstates attribution fidelity — models that agree on answers often disagree on why. They document an 'access-validity inversion' where white-box signals like attention patterns are stable across models but weakly predictive of causal attributions, undermining the common practice of using open surrogates to explain closed systems.

Evaluation and Benchmarking AI Safety Research Qwen Surrogate Fidelity: When Can Open LLMs Explain Closed Ones?Llama +3 more

5arXiv · cs.CL·Jun 30, 2026·source ↗

Multi-agent system using open-source LLMs outperforms GPT-4 on disinformation detection

A new arXiv preprint proposes a multi-agent system for automated disinformation detection that emulates human annotator decision-making through consensus mechanisms, cognitive diversity, and hierarchical structure. The system uses open-source models (LLaMA, Kimi, Qwen, DeepSeek, LLaMA-Nemotron) and is evaluated on English, Polish, Slovak, and Bulgarian datasets across three fact-checking tasks. Results claim superior performance over individual LLMs including GPT-4 and GPT-3.5, with transparency benefits from using open weights models.

Open Weights Progress Agent and Tool Ecosystem Llama Nemotron Kimi DeepSeek V4 +5 more

6arXiv · cs.AI·Jun 30, 2026·source ↗

MESA framework proactively ranks vulnerable communication channels in multi-agent systems

Researchers introduce MESA, a label-free framework for prioritizing security-critical communication edges in multi-agent systems (MAS) before attacks are observed. The framework combines six graph-theoretic metrics with two dynamic probes (ablation and masking) to rank edges by compromise risk, without requiring attack traces. Evaluated across three MAS scenarios, eight network topologies, and five open-source LLMs, MESA achieves mean Spearman ρ=+0.60 correlation with empirical per-edge attack success, and monitoring the top 10% of ranked edges intercepts roughly 3x more successful attacks than random allocation. The work highlights that attack impact in MAS is highly concentrated — a single compromised edge can account for up to 75% of total attack success.

AI Safety Research Agent and Tool Ecosystem MESA MESA: Prioritizing Vulnerable Communication Channels for Securing Multi-Agent Systems Gemma +3 more

5Qwen·Jun 26, 2026·source ↗

Qwen releases Qwen3-ASR-1.7B, a multilingual automatic speech recognition model

Qwen has released Qwen3-ASR-1.7B on Hugging Face, a 1.7B parameter automatic speech recognition model supporting multiple languages including Chinese (Mandarin and Cantonese), English, Arabic, German, French, and Spanish. The model uses the Qwen3 architecture and is released in safetensors format. This extends the Qwen3 model family into the speech domain.

Open Weights Progress Multimodal Progress Qwen3-ASR-0.6B Qwen Hugging Face

4Qwen·Jun 26, 2026·source ↗

Qwen releases Qwen3-ASR-0.6B, a multilingual automatic speech recognition model

Alibaba's Qwen team released Qwen3-ASR-0.6B on Hugging Face, a 0.6B parameter automatic speech recognition model supporting multiple languages including Chinese, English, Cantonese, Arabic, German, French, and Spanish. The model uses the Qwen3 architecture and is available in HuggingFace-native format with safetensors weights. This is a compact ASR model extending the Qwen3 family into speech recognition.

Open Weights Progress Multimodal Progress Qwen3-ASR-0.6B Qwen Hugging Face

3Qwen·Jun 26, 2026·source ↗

Qwen releases Qwen3-ForcedAligner-0.6B, a multilingual token-classification model for ASR alignment

Qwen released Qwen3-ForcedAligner-0.6B on Hugging Face, a 0.6B parameter token-classification model tagged for ASR (automatic speech recognition) forced alignment tasks. The model supports multiple languages including Chinese, English, Cantonese, French, German, Italian, Japanese, and Korean. This is a specialized small model extending the Qwen3 family into speech-text alignment use cases.

Open Weights Progress Qwen3-ForcedAligner-0.6B Qwen Hugging Face

6arXiv · cs.CL·Jun 25, 2026·source ↗

SafeVec and RAS: White-box LLM safety evaluation via internal refusal representations

Researchers introduce SafeVec, a white-box safety evaluation procedure that measures LLM safety from internal hidden-state representations rather than generated outputs. The method extracts layer-wise refusal directions from a safety-aligned reference model, identifies stable layers where safe and unsafe behaviors are separable, and scores target models via a calibrated 0-100 Refusal Alignment Score (RAS). Evaluated across Llama, Gemma, and Qwen model families, RAS distinguishes aligned from uncensored/abliterated variants and correlates with output-level attack success rates while being substantially faster than judge-based evaluation. The approach addresses key limitations of output-level safety evals: cost, judge sensitivity, and dependence on fixed question banks.

Evaluation and Benchmarking AI Safety Research SafeVec Gemma RAS: Measuring LLM Safety Through Refusal Alignment +2 more

3arXiv · cs.CL·Jun 24, 2026·source ↗

First Turkish phone scam detection dataset evaluated across seven LLMs in multi-modal settings

Researchers introduce the first public multi-modal dataset of 100 aligned audio-transcript pairs of Turkish scam and benign phone calls, evaluating seven LLMs (Gemini 2.5 Flash/Flash-Lite/Pro, GPT-4o, Qwen Max/Plus/Turbo) under three input conditions. Transcript-based inputs consistently outperform direct audio processing, while human-corrected and uncorrected transcripts perform comparably. The work addresses a gap in low-resource language safety research and highlights the need for linguistically inclusive fraud detection systems.

AI Safety Research Multimodal Progress Google GPT-4o Gemini-2.5-Flash-Lite +3 more

6Qwen·Jun 24, 2026·source ↗

Qwen releases AgentWorld-35B-A3B: a world-model and environment-simulation MoE for agents

Qwen has released Qwen-AgentWorld-35B-A3B on Hugging Face, a 35B-parameter MoE model (3B active) built on the Qwen3.5 MoE architecture. The model is tagged for world-model and environment-simulation use cases, suggesting it is designed to simulate environments for agent training or evaluation. It is paired with a dataset called AgentWorldBench, indicating an associated evaluation suite. Early engagement is minimal (0 downloads, 4 likes) but the model represents a notable direction in agent-environment modeling from a major open-weights lab.

Open Weights Progress Agent and Tool Ecosystem AgentWorldBench Qwen-AgentWorld-35B-A3B Qwen +1 more

5arXiv · cs.CL·Jun 23, 2026·source ↗

KDoS framework proposes distribution-optimized synthetic data for LLM knowledge injection

Researchers introduce KDoS (Knowledge Distribution-optimized Synthesis), a framework that uses a three-stage feedback mechanism guided by 'knowledge density' to optimize the distribution of synthetic training data for LLMs. Rather than stopping at preset token counts or fixed ratios, KDoS dynamically adjusts synthesis to avoid sparse or redundant domain coverage. Experiments across Qwen, Ling, and LLaMA models (0.6B–16B parameters) on 1B–5B token scales show consistent improvements over baselines on six knowledge benchmarks. A key finding is that an optimal knowledge distribution exists and remains stable across model families and scales.

Evaluation and Benchmarking KDoS Qwen Llama +1 more

6arXiv · cs.CL·Jun 17, 2026·source ↗

RubricsTree: Scalable hierarchical rubric framework for evaluating personal health AI agents

RubricsTree is a new evaluation framework for LLM-powered personal health agents, built around a hierarchical taxonomy of over 100 clinically-verifiable Boolean rubrics derived from 4,000 real user queries and curated with physician oversight. A context-aware router activates only relevant rubrics per query, enabling scalable yet expert-aligned evaluation. The framework outperforms strong LLM-as-a-judge baselines on expert alignment and, when used as training signal, yields up to ~66% relative gains on HealthBench across Gemini, GPT, and Qwen model families. The work addresses a concrete bottleneck in clinical deployment of health AI: the cost-quality tradeoff in evaluation.

Evaluation and Benchmarking AI Safety Research HealthBench RubricsTree Qwen +2 more

5arXiv · cs.CL·Jun 15, 2026·source ↗

RePro: Retrospective Progress-Aware Self-Refinement for LLM Agent Training

Researchers introduce RePro (Retrospective Progress-Aware Training), a framework addressing the gap between step-wise RL optimization and metacognitive task-progress awareness in LLM agents. The approach uses a forward-then-reflect rollout paradigm where agents execute actions online and then retrospectively assess step-wise progress given the completed trajectory and known outcome. Evaluated on WebShop, ALFWorld, and Sokoban, RePro achieves up to 12% absolute success rate gains over baseline Qwen-family models without requiring continuous external supervision.

Agent and Tool Ecosystem Alignment and RLHF ALFWorld Sokoban RePro +2 more

4Github Trending·Jun 14, 2026·source ↗

Open Interpreter: lightweight coding agent for open models (Deepseek, Kimi, Qwen)

Open Interpreter is an open-source Python coding agent framework supporting open-weight models including Deepseek, Kimi, and Qwen. The project has accumulated nearly 64,000 GitHub stars, with 45 new stars on the trending day. It provides a lightweight harness for running code-executing agents on locally-hosted or open models.

Open Weights Progress Agent and Tool Ecosystem Kimi DeepSeek V4 Qwen +1 more

5arXiv · cs.AI·Jun 11, 2026·source ↗

Reroute: Training-free recoverable visual token routing for vision-language models

A new arXiv preprint proposes Reroute, a training-free plug-in that replaces the standard rank-and-remove visual token pruning paradigm in VLMs with a recoverable routing mechanism. Instead of permanently discarding low-ranked tokens, Reroute defers them to re-enter the candidate pool at later decoder stages, addressing the problem that token importance shifts across decoder depth. Evaluated on LLaVA-1.5 and Qwen backbones augmented with FastV, PDrop, and Nüwa pruning methods, Reroute improves grounding performance under aggressive token reduction without sacrificing general VQA accuracy. The approach preserves the theoretical compute and KV-cache budget of the underlying pruning method.

Inference Economics Multimodal Progress FastV PDrop Qwen +4 more

7Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen3.5-35B-A3B multimodal MoE model on Hugging Face

Qwen has released Qwen3.5-35B-A3B, a 35B-parameter mixture-of-experts image-text-to-text model with approximately 3B active parameters, published on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With over 2.8 million downloads and 1,400+ likes, it has seen substantial community uptake.

Frontier Model Releases Open Weights Progress Qwen3.5-35B-A3B Qwen +1 more

7Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen3.5-27B multimodal model on Hugging Face

Qwen has released Qwen3.5-27B, a 27-billion parameter image-text-to-text model, on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With nearly 3 million downloads and 981 likes, it has seen substantial community uptake.

Frontier Model Releases Open Weights Progress Qwen3.6-27B Qwen Hugging Face +1 more

6Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen3.5-35B-A3B-Base multimodal MoE model on Hugging Face

Qwen has released Qwen3.5-35B-A3B-Base, a 35B-parameter mixture-of-experts image-text-to-text base model on Hugging Face, activating approximately 3B parameters per forward pass. The model supports conversational use and is compatible with Azure deployment endpoints. With over 109K downloads, it represents a notable open-weights multimodal MoE release from the Qwen team.

Frontier Model Releases Open Weights Progress Qwen3.5-35B-A3B-Base Qwen Hugging Face +1 more

7Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen3.5-122B-A10B multimodal MoE model on Hugging Face

Qwen has released Qwen3.5-122B-A10B, a 122B-parameter mixture-of-experts image-text-to-text model with 10B active parameters, published on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. High download counts (840K) and likes (564) suggest rapid community uptake shortly after release.

Frontier Model Releases Open Weights Progress Microsoft Azure Qwen Qwen3.5-122B-A10B +2 more

6Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen3.5-9B-Base multimodal model on Hugging Face

Qwen has released Qwen3.5-9B-Base, a 9-billion-parameter image-text-to-text base model on Hugging Face. The model supports conversational use and is compatible with the transformers library and inference endpoints. With over 153,000 downloads, it has seen substantial early adoption.

Frontier Model Releases Open Weights Progress Qwen3.5-2B-Base Qwen +1 more

6Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen3.5-9B multimodal model on Hugging Face

Qwen has released Qwen3.5-9B, a 9-billion parameter image-text-to-text model, on Hugging Face. The model supports conversational use cases and is compatible with Azure deployment endpoints. With over 9 million downloads and 1,500+ likes, it has seen substantial community uptake.

Frontier Model Releases Open Weights Progress Microsoft Azure Qwen3-4B Qwen +2 more

6Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen3.5-4B-Base multimodal model on Hugging Face

Qwen has released Qwen3.5-4B-Base, a 4-billion parameter base model supporting image-text-to-text tasks, published on Hugging Face. The model is tagged as conversational and endpoints-compatible, using the safetensors format. With over 207,000 downloads, it represents a new entry in the Qwen3.5 model family with multimodal capabilities at a small parameter count.

Frontier Model Releases Open Weights Progress Qwen Qwen3-4B-Base Hugging Face +1 more

6Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen3.5-4B multimodal model on Hugging Face

Qwen has released Qwen3.5-4B, a 4-billion parameter image-text-to-text model, on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With over 10 million downloads and 604 likes, it has seen substantial community uptake.

Open Weights Progress Multimodal Progress Microsoft Azure Qwen3-4B Qwen +1 more

6Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen3.5-2B multimodal model on Hugging Face

Alibaba's Qwen team released Qwen3.5-2B, a 2-billion-parameter image-text-to-text model, on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With nearly 2 million downloads, it has seen substantial community uptake.

Open Weights Progress Multimodal Progress Qwen3.5-2B-Base Microsoft Azure Qwen +1 more

5Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen3.5-0.8B multimodal model on Hugging Face

Alibaba's Qwen team released Qwen3.5-0.8B, a small-scale image-text-to-text model, on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With over 2.7 million downloads and 562 likes, it has seen substantial community uptake for a sub-1B parameter multimodal model.

Open Weights Progress Multimodal Progress Qwen3.5-0.8B Microsoft Azure Qwen +1 more

6Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen3.5-2B-Base multimodal model on Hugging Face

Qwen released Qwen3.5-2B-Base, a 2-billion parameter base model supporting image-text-to-text tasks, on Hugging Face. The model is tagged as conversational and endpoints-compatible, suggesting deployment readiness. With nearly 180K downloads, it has seen significant early adoption in the open-weights community.

Frontier Model Releases Open Weights Progress Qwen3.5-2B-Base Qwen Hugging Face +1 more

5Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen3.5-0.8B-Base multimodal model on Hugging Face

Qwen has released Qwen3.5-0.8B-Base, a small 0.8B parameter image-text-to-text base model on Hugging Face. The model supports conversational use and is compatible with Hugging Face endpoints. With nearly 200K downloads, it signals meaningful community uptake for a compact multimodal base model.

Open Weights Progress Multimodal Progress Qwen3-8B-Base Qwen Hugging Face

6Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen3.6-35B-A3B multimodal MoE model on Hugging Face

Qwen published Qwen3.6-35B-A3B, a 35B-parameter mixture-of-experts image-text-to-text model with 3B active parameters, on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With over 5.9 million downloads and 2,000 likes, it has seen substantial community uptake.

Frontier Model Releases Open Weights Progress Qwen3.6-35B-A3B Microsoft Azure Qwen +2 more

6Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen3.6-27B multimodal model on Hugging Face

Qwen published Qwen3.6-27B, a 27-billion-parameter image-text-to-text model, on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With over 5.4 million downloads and 1,619 likes, it has seen substantial community uptake.

Frontier Model Releases Open Weights Progress Qwen3.6-27B Qwen Hugging Face +1 more

5Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen-Image-Bench, a multimodal judge/evaluation model

Qwen has released Qwen-Image-Bench on Hugging Face, an image-text-to-text model tagged as a judge-model for evaluation and benchmarking purposes. The model supports both English and Chinese and appears designed to evaluate text-to-image outputs. With 8,572 downloads and 50 likes shortly after release, it has attracted modest early interest.

Evaluation and Benchmarking Open Weights Progress Qwen-Image-Bench Qwen Hugging Face +1 more

5arXiv · cs.AI·Jun 4, 2026·source ↗

GeM-NR: Training-free multi-view editing for nonrigid 3D scene changes

GeM-NR is a training-free method for multi-view consistent image editing that handles nonrigid edits — changes that substantially alter scene geometry and appearance — a capability that existing methods largely lack. Given an anchor image edited by a backbone model (FLUX, Qwen, or BrushNet) and an unedited query image, the method propagates the edit consistently across viewpoints via depth estimation, point-cloud alignment, projection, and conditioned refinement. The authors report state-of-the-art performance on edit quality and geometric/photometric consistency across multiple views, including generation of 3D representations of edited scenes.

Multimodal Progress BrushNet Qwen GeM-NR +1 more

5arXiv · cs.CL·Jun 3, 2026·source ↗

Knowledge editing via locate-then-edit transferred to masked diffusion language models, revealing multi-token failure mode

A new arXiv paper investigates whether locate-then-edit knowledge editing methods, developed for autoregressive models, transfer to masked diffusion language models (MDMs) such as LLaDA and Dream. The authors find that causal tracing identifies the same early-to-mid-layer MLP location in both paradigms, but MDMs degrade systematically on multi-token edits due to partially unmasked intermediate states that the edit was never optimized for. A correction targeting these intermediate states substantially restores multi-token editing performance. The work is the first systematic comparison of knowledge editing across autoregressive and diffusion-based language model paradigms.

Evaluation and Benchmarking Open Weights Progress Knowledge Editing in Masked Diffusion Language Models Qwen Llama +2 more

6arXiv · cs.LG·Jun 3, 2026·source ↗

Skill-RM: A unified reward model framework treating evaluation as an agentic skill

Researchers from the Qwen team propose Skill-RM, a framework that reformulates reward modeling as the execution of a reusable 'Reward-Evaluation Skill,' enabling a single model to orchestrate heterogeneous evaluation criteria including rule-based verifiers, ground-truth references, and rubrics. By treating reward computation as a structured agentic task, Skill-RM dynamically selects and aggregates evidence per input rather than relying on static evaluation. Experiments on reward benchmarks and downstream tasks (best-of-N selection, RL) show consistent improvements over traditional judge baselines. The code is publicly released under the Qwen-Applications GitHub organization.

Evaluation and Benchmarking Agent and Tool Ecosystem Skill-RM Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill Alibaba +2 more

6arXiv · cs.CL·May 27, 2026·source ↗

SAERL: Using Sparse Autoencoders to Guide LLM Reinforcement Learning Data Engineering

SAERL is a post-training data engineering framework that uses Sparse Autoencoders (SAEs) — a mechanistic interpretability tool — to extract intrinsic model signals for controlling data diversity, difficulty, and quality during RL fine-tuning. The framework applies SAE-space clustering for batch diversity, a difficulty proxy for curriculum ordering, and a quality probe for data filtering. On Qwen2.5-Math-1.5B with GRPO, SAERL achieves 3% average accuracy improvement and reaches target accuracy with 20% fewer training steps. SAE representations transfer across model families and scales, suggesting broad applicability as a lightweight data engineering tool.

Training Infrastructure Evaluation and Benchmarking mechanistic interpretability GRPO Reinforcement Learning from Human Feedback +6 more

6arXiv · cs.CL·May 21, 2026·source ↗

DelTA: Discriminative Token Credit Assignment for RLVR Training

DelTA introduces a discriminative token credit assignment method for reinforcement learning from verifiable rewards (RLVR) that addresses the problem of high-frequency formatting tokens dominating policy gradient updates. The method estimates per-token coefficients to amplify side-specific gradient directions and downweight shared or weakly discriminative ones, making the effective update direction more contrastive. On seven mathematical benchmarks, DelTA outperforms same-scale baselines by 3.26 and 2.62 average points on Qwen3-8B-Base and Qwen3-14B-Base respectively, with additional gains on code generation tasks.

Frontier Model Releases Evaluation and Benchmarking DelTA Qwen3-8B-Base policy gradient +5 more

7Hacker News·May 20, 2026·source ↗

Qwen3.7-Max: The Agent Frontier

Alibaba's Qwen team has announced Qwen3.7-Max, positioned as a frontier model for agentic tasks. The announcement appears on the official Qwen blog and generated significant community discussion on Hacker News with 559 points and 217 comments. The model name suggests it is part of the Qwen 3 generation, with a focus on agent capabilities.

Frontier Model Releases Open Weights Progress Alibaba Qwen Qwen2.5-Max +1 more

8Mistral Ai News·May 18, 2026·source ↗

Mistral Small 4: Unified Multimodal, Reasoning, and Coding MoE Model Released Under Apache 2.0

Mistral AI has released Mistral Small 4, a 119B-parameter Mixture-of-Experts model (6B active per token) that unifies capabilities previously split across Magistral (reasoning), Pixtral (multimodal), and Devstral (coding agents) into a single open-weights model. The model features a 256k context window, configurable reasoning effort via a `reasoning_effort` parameter, native text and image input support, and is released under Apache 2.0. Mistral claims 40% latency reduction and 3x throughput improvement over Mistral Small 3, with benchmark results showing competitive performance against GPT-OSS 120B and Qwen models while producing significantly shorter outputs. The release includes day-0 availability as an NVIDIA NIM and support across vLLM, llama.cpp, SGLang, and Transformers.

Long Context Evolution Frontier Model Releases Mistral AI Mistral Small 4 Pixtral +14 more

4Qwen Research·May 18, 2026·source ↗

OFASys: Multitask Multimodal Learning Framework from Alibaba/Qwen

Alibaba's Qwen team released OFASys, an open-source framework designed to simplify multimodal multitask learning, building on their earlier OFA unified pretrained model. The system aims to reduce engineering friction in setting up multi-task, multi-modal training pipelines, including data batching and training stability. It is positioned as infrastructure for building generalist AI models with minimal code overhead.

Agent and Tool Ecosystem Multimodal Progress Alibaba OFA Qwen +1 more

4Qwen Research·May 18, 2026·source ↗

OFA: Towards Building a One-For-All Unified Multimodal Pretrained Model

Alibaba's Qwen team introduces OFA (One-For-All), a unified multimodal pretrained model designed to handle both understanding and generation tasks across multiple modalities within a single framework. The model is pretrained using instruction-based multitask pretraining to endow it with diverse capabilities. This work was published in late 2022 as part of the broader wave of generalist multimodal models. It represents an early effort toward a single model architecture capable of spanning vision, language, and cross-modal tasks.

Frontier Model Releases Multimodal Progress Alibaba DAMO Academy Qwen OFA (One-For-All)+1 more

4Qwen Research·May 18, 2026·source ↗

Introducing the Qwen Series: Overview of Alibaba's Open-Source LLM Journey

Alibaba's Qwen team published a retrospective introduction to the Qwen series of large language models, four months after the initial Qwen-7B open-source release. The post consolidates links to their paper, GitHub, Hugging Face, and ModelScope repositories, and outlines the team's objectives for the open-source LLM program. It serves as a canonical reference point for the Qwen model family's public positioning.

Frontier Model Releases Open Weights Progress Alibaba Qwen-7B Qwen +2 more

6Qwen Research·May 18, 2026·source ↗

Introducing Qwen-VL-Plus and Qwen-VL-Max: Upgraded Multimodal Models from Alibaba

Alibaba's Qwen team has launched two enhanced versions of their multimodal model, Qwen-VL-Plus and Qwen-VL-Max, building on the open-sourced Qwen-VL released in September 2023. Key improvements include substantially boosted image reasoning capabilities, enhanced detail recognition and text extraction from images, and support for high-definition images exceeding one million pixels across various aspect ratios. The upgrades represent a significant step forward in the Qwen-VL series' generalization and visual understanding capabilities.

Frontier Model Releases Open Weights Progress Qwen-VL Qwen-VL-Max Alibaba +2 more