paper
Redesign Mixture-of-Experts Routers with Manifold Power Iteration
paperactiveprovisional
redesign-mixture-of-experts-routers-with-manifold-power-iteration-87887772·1 events·first seen 6d agoAliases: Redesign Mixture-of-Experts Routers with Manifold Power Iteration
Co-occurring entities
More like this (12)
Manifold Power IterationSparse Mixture-of-ExpertsTying the Loop -- Tied Expert Layers in Mixture-of-Experts Language ModelsMixture of ExpertsFrom Observation to Intervention: A Causal Audit of Expert Importance in Mixture-of-Experts ModelsLayer-Adaptive Expert PruningOptimal Mixture Transport (OMT)dynamic skill routingRandom Network DistillationData Mixture Surgerymixture-density networksMulti-Turn Evaluation of Deep Research Agents Under Process-Level Feedback
Recent events (1)
Manifold Power Iteration redesigns MoE routers by aligning rows with expert singular directions
A new arXiv preprint proposes Manifold Power Iteration (MPI), a principled redesign of Mixture-of-Experts router matrices that aligns each router row with the principal singular direction of its associated expert. The method uses a 'Power-then-Retract' paradigm to enforce norm constraints while driving convergence toward these singular directions. Empirical validation spans MoE pretraining at scales from 1B to 11B parameters, showing improved model effectiveness.