Entity · model

Qwen1.5-MoE-A2.7B

modelactiveqwen1-5-moe-a2-7b-98f6e2f3·2 events·first seen May 18, 2026

Aliases: Qwen1.5-MoE-A2.7B

Co-occurring entities

OLMoE-1B-7B-0924 From Observation to Intervention: A Causal Audit of Expert Importance in Mixture-of-Experts Models DeepSeek Coder V2 lite Mixtral Qwen1.5-7B Mistral 7B Mixture of Experts ModelScope HuggingFace Alibaba Qwen Team

More like this (12)

Qwen1.5-110B Qwen1.5-72B Qwen3-1.7B Qwen1.5-7B Qwen1.5-32B Qwen 2.5-7B Qwen2.5-8B Qwen2.5-7B Qwen3.5-0.8B Qwen2.5-1.5B Qwen2.5-0.5B Qwen2-57B-A14B

Recent events (2)

6arXiv · cs.CL·Jun 10, 2026·source ↗

Causal audit finds routing statistics do not predict expert importance in MoE pruning

A new arXiv paper conducts a token-level interventional audit of Mixture-of-Experts (MoE) pruning heuristics across three architectures (OLMoE-1B-7B, Qwen1.5-MoE, DeepSeek-V2-Lite), finding that no standard observational metric — utilization rates, activation norms, routing weight distributions — reliably predicts which experts can be removed without functional cost. Effect sizes fall below Cohen's d = 0.17 across all 60 metric-layer combinations after multiple-comparison correction, with only a single significant signal at OLMoE's final layer. The authors argue that existing pruning methods succeed not because they identify dispensable experts but because early-layer redundancy makes most selection criteria interchangeable. The work frames this as a concrete counterexample to the broader interpretability practice of treating associational (rung-1) evidence as interventional (rung-2) conclusions.

Evaluation and Benchmarking Inference Economics OLMoE-1B-7B-0924 From Observation to Intervention: A Causal Audit of Expert Importance in Mixture-of-Experts Models Qwen1.5-MoE-A2.7B +2 more

6Qwen Research·May 18, 2026·source ↗

Qwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters

Alibaba's Qwen team releases Qwen1.5-MoE-A2.7B, a mixture-of-experts model with only 2.7 billion activated parameters that claims performance parity with 7B dense models such as Mistral 7B and Qwen1.5-7B. The model activates roughly one-third of its total parameters during inference, offering significant compute efficiency gains. This release follows growing industry interest in MoE architectures sparked by Mixtral, and the model is available on GitHub, HuggingFace, and ModelScope.

Frontier Model Releases Open Weights Progress Mixtral Qwen1.5-MoE-A2.7B Qwen1.5-7B +6 more