technique
Lion
techniqueactiveprovisional
lion-11134066·1 events·first seen 2d agoAliases: Lion
Co-occurring entities
More like this (12)
Recent events (1)
Open problem paper questions whether AdamW converges under heavy-tailed gradient noise
A preprint from arXiv frames as an open problem whether AdamW, the dominant optimizer for LLM pretraining, can achieve rigorous convergence guarantees under heavy-tailed stochastic gradient noise. The authors note that sign-based optimizers like Lion and Muon already have sharp heavy-tailed convergence rates, while AdamW's second-moment accumulator may create a fundamental obstruction by hiding large gradients. The paper proves a positive weighted-metric benchmark and introduces a corridor lower-bound mechanism to characterize the potential failure mode.