Almanac
model

Phi-3.5-MoE-instruct

modelactiveprovisionalphi-3-5-moe-instruct-2d22abc8·1 events·first seen 11h ago

Aliases: Phi-3.5-MoE-instruct

Co-occurring entities

More like this (12)

Recent events (1)

4arXiv · cs.CL·11h ago·source ↗

SARA framework aligns MoE routing distributions to improve low-resource multilingual performance

Researchers introduce SARA (Semantically Anchored Routing Alignment), a framework that addresses cross-lingual routing divergence in sparse Mixture-of-Experts LLMs by aligning the internal routing distributions of low-resource language tokens to match those of high-resource semantic anchors via symmetric Jensen-Shannon divergence constraints. Unlike logit-level distillation, SARA operates directly on MoE routing layers to encourage mechanistic consistency in expert selection across languages. Experiments on Qwen3-30B-A3B and Phi-3.5-MoE-instruct across 5 low-resource languages show modest but consistent gains (up to +1.2%) on Global-MMLU over standard instruction tuning.