paper

Toward Calibrated Mixture-of-Experts Under Distribution Shift

paperactiveprovisionaltoward-calibrated-mixture-of-experts-under-distribution-shift-2350fa95·1 events·first seen 47h ago

Aliases: Toward Calibrated Mixture-of-Experts Under Distribution Shift

More like this (12)

Sparse Mixture-of-Experts Mixture of Experts From Observation to Intervention: A Causal Audit of Expert Importance in Mixture-of-Experts Models Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models Redesign Mixture-of-Experts Routers with Manifold Power Iteration From Self-Supervised Speech Models to Mixture-of-Experts for Robust Anti-Spoofing Data Mixture Surgery Uncertainty Calibration Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning Optimal Deterministic Multicalibration and Omniprediction Generalized Method of Moments distributionally robust optimization

Recent events (1)

4arXiv · cs.AI·47h ago·source ↗

Calibrated Mixture-of-Experts under distribution shift: adversarial reweighting approach

A new arXiv preprint analyzes how mixture-of-experts (MoE) models maintain calibration under distribution shift, examining the interaction between routing mechanisms and expert-level calibration. The authors prove that expert calibration is sufficient for overall model calibration in hard-routed MoE but insufficient for soft-routed variants. To address the soft-routing gap, they propose an adversarial reweighting method that penalizes calibration errors of the routed aggregate under distribution shift, demonstrating improved accuracy-calibration tradeoffs across model classes and tasks.

Frontier Model Releases Evaluation and Benchmarking Toward Calibrated Mixture-of-Experts Under Distribution Shift +1 more