paper
Toward Calibrated Mixture-of-Experts Under Distribution Shift
paperactiveprovisional
toward-calibrated-mixture-of-experts-under-distribution-shift-2350fa95·1 events·first seen 47h agoAliases: Toward Calibrated Mixture-of-Experts Under Distribution Shift
More like this (12)
Sparse Mixture-of-ExpertsMixture of ExpertsFrom Observation to Intervention: A Causal Audit of Expert Importance in Mixture-of-Experts ModelsTying the Loop -- Tied Expert Layers in Mixture-of-Experts Language ModelsRedesign Mixture-of-Experts Routers with Manifold Power IterationFrom Self-Supervised Speech Models to Mixture-of-Experts for Robust Anti-SpoofingData Mixture SurgeryUncertainty CalibrationSparse Subspace-to-Expert Sharing for Task-Agnostic Continual LearningOptimal Deterministic Multicalibration and OmnipredictionGeneralized Method of Momentsdistributionally robust optimization
Recent events (1)
Calibrated Mixture-of-Experts under distribution shift: adversarial reweighting approach
A new arXiv preprint analyzes how mixture-of-experts (MoE) models maintain calibration under distribution shift, examining the interaction between routing mechanisms and expert-level calibration. The authors prove that expert calibration is sufficient for overall model calibration in hard-routed MoE but insufficient for soft-routed variants. To address the soft-routing gap, they propose an adversarial reweighting method that penalizes calibration errors of the routed aggregate under distribution shift, demonstrating improved accuracy-calibration tradeoffs across model classes and tasks.