Entity · model

Qwen3-30B-A3B

modelactiveqwen3-30b-a3b-88c735da·6 events·first seen May 18, 2026

Aliases: Qwen3-30B-A3B, Qwen3-VL-30B-A3B

Co-occurring entities

More like this (12)

Qwen3-30B Qwen3-235B Qwen3-30B-A3B-Base Qwen3-4B Qwen3.5-35B-A3B Qwen3.6-35B-A3B Qwen3-30B-A3B-Instruct Qwen2-57B-A14B Qwen3.5-122B-A10B Qwen3.6-27B Qwen3.5 397B A17B Qwen3.5-35B-A3B-Base

Recent events (6)

6arXiv · cs.AI·40h ago·source ↗

Empirical study finds inference-time scaling yields diminishing returns for local computer-use agents

Researchers present a systematic empirical study of inference-time scaling across four dimensions (contextual, temporal, structural, parallel) for locally-deployed computer-use agents under hardware constraints. Evaluating Qwen3-VL-8B/30B-A3B, UI-TARS-1.5-7B, and OpenCUA-7B on OSWorld, they find that additional compute often shifts rather than eliminates failure modes—contextual scaling saturates, temporal scaling extends erroneous trajectories, and structural decomposition adds overhead. The findings argue for selective compute allocation and failure-aware control mechanisms tailored to local model capabilities.

Evaluation and Benchmarking Inference Economics Qwen3-30B-A3B Qwen3-4B OpenCUA-7B +4 more

4arXiv · cs.CL·Jul 15, 2026·source ↗

FukuyamaBench and mechanism-aware training improve LLM chemical reaction reasoning

Researchers introduce a large-scale reaction mechanism reasoning dataset and FukuyamaBench, a benchmark derived from Fukuyama's Advanced Organic Reaction Mechanism textbook, to evaluate step-by-step chemical mechanism inference in LLMs. A fine-tuned Qwen3-30B-A3B model achieves 8.3% exact pathway match on FukuyamaBench Set A, outperforming the specialized FlowER model at 5.1%. The work argues that mechanism-aware training addresses hallucinations and physical inconsistencies in current chemical LLMs, which typically focus only on coarse-grained product prediction and retrosynthesis.

Evaluation and Benchmarking FukuyamaBench Qwen3-30B-A3B Flower

6arXiv · cs.CL·Jun 30, 2026·source ↗

MOPD: Multi-Teacher On-Policy Distillation for integrating multiple RL-trained capabilities in LLMs

Researchers propose Multi-teacher On-Policy Distillation (MOPD), a post-training paradigm that first trains domain-specialized RL teacher models, then distills them into a student model using on-policy rollouts to eliminate exposure bias. Evaluated on Qwen3-30B-A3B, MOPD outperforms Mix-RL, Cascade RL, Off-Policy Finetune, and Param-Merge baselines while preserving nearly all per-domain capability. The method has been deployed in production for MiMo-V2-Flash, an industrial-scale frontier model, validating its practical applicability. The approach also enables parallel, decoupled development of domain teachers, reducing cross-domain interference in multi-capability post-training.

Frontier Model Releases Alignment and RLHF Qwen3-30B-A3B MiMo-V2-Flash Multi-Teacher On-Policy Distillation +1 more

4arXiv · cs.CL·Jun 25, 2026·source ↗

SARA framework aligns MoE routing distributions to improve low-resource multilingual performance

Researchers introduce SARA (Semantically Anchored Routing Alignment), a framework that addresses cross-lingual routing divergence in sparse Mixture-of-Experts LLMs by aligning the internal routing distributions of low-resource language tokens to match those of high-resource semantic anchors via symmetric Jensen-Shannon divergence constraints. Unlike logit-level distillation, SARA operates directly on MoE routing layers to encourage mechanistic consistency in expert selection across languages. Experiments on Qwen3-30B-A3B and Phi-3.5-MoE-instruct across 5 low-resource languages show modest but consistent gains (up to +1.2%) on Global-MMLU over standard instruction tuning.

Evaluation and Benchmarking Open Weights Progress Global-MMLU Qwen3-30B-A3B Phi-3.5-MoE-instruct +1 more

6arXiv · cs.CL·May 19, 2026·source ↗

ZEDA: Post-Trained MoE Models Can Skip Half Their Experts via Self-Distillation

This paper introduces Zero-Expert Self-Distillation Adaptation (ZEDA), a framework that converts static post-trained Mixture-of-Experts (MoE) language models into dynamic ones without pre-training from scratch. ZEDA injects parameter-free zero-output experts into each MoE layer and uses two-stage self-distillation with the original model as a frozen teacher. Applied to Qwen3-30B-A3B and GLM-4.7-Flash across 11 benchmarks, ZEDA eliminates over 50% of expert FLOPs with marginal accuracy loss and achieves approximately 1.20× end-to-end inference speedup, outperforming the strongest dynamic MoE baseline by 4–6 points.

Training Infrastructure Frontier Model Releases Self-Distillation ZEDA (Zero-Expert Self-Distillation Adaptation)Qwen3-30B-A3B +3 more

8Qwen Research·May 18, 2026·source ↗

Qwen3 Release: Flagship 235B MoE and Full Model Family Announced

Alibaba's Qwen team has released Qwen3, a new family of large language models including the flagship Qwen3-235B-A22B mixture-of-experts model. The flagship model claims competitive benchmark performance against DeepSeek-R1, OpenAI o1/o3-mini, Grok-3, and Gemini-2.5-Pro on coding, math, and general capabilities. A smaller MoE variant, Qwen3-30B-A3B, reportedly outperforms QwQ-32B despite using only one-tenth the activated parameters, and the 4B model is said to match Qwen2.5's larger models. Models are available across Hugging Face, ModelScope, and Kaggle.

Frontier Model Releases Evaluation and Benchmarking Alibaba Qwen DeepSeek V4 Qwen3-30B-A3B +10 more