Almanac
technique

mirror descent

techniqueactiveprovisionalmirror-descent-e15802c1·1 events·first seen 22d ago

Aliases: mirror descent

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.LG·22d ago·source ↗

Hamiltonian Probability Gradient Flow Analysis of the Muon Optimizer

This paper develops a rigorous theoretical framework for the Muon optimizer by interpreting its regularized orthogonalization map as the gradient of a Fenchel-dual smoothing of the nuclear norm, identifying Muon updates as mirror/prox steps with momentum as dual coordinates. The authors lift this structure to probability measures over matrix-valued parameters, deriving a mean-field phase-space equation that constitutes a damped Hamiltonian probability dynamics with monotonically decreasing Hamiltonian energy. Exponential convergence rates are established under gradient-dominance and curvature assumptions, and propagation-of-chaos guarantees are provided for the interacting particle system. The framework extends to transformer mixture-of-experts architectures via blockwise Muon probability flows.