OLMo2
olmo2-0d7f5e9b·1 events·first seen 22d agoAliases: OLMo2
Co-occurring entities
More like this (12)
Recent events (1)
Shannon Scaling Law: A Noisy-Channel Framework for LLM Capacity and Non-Monotonic Training Phenomena
Researchers propose the Shannon Scaling Law, a theoretical framework that models LLM training as information transmission over a noisy channel using the Shannon-Hartley theorem. By mapping model parameters to channel bandwidth and training tokens to signal power, the framework introduces a fundamental SNR-based capacity limit that explains non-monotonic phenomena like catastrophic overtraining and quantization-induced degradation that classical power-law scaling laws cannot capture. Validated on Pythia and OLMo2 under Gaussian noise, quantization, and fine-tuning perturbations, the law achieves strong R² scores and successfully extrapolates from 6.9B to 12B parameter models trained on up to 307B tokens. The framework outperforms both classical and perturbation-aware scaling laws, predicting U-shaped performance degradation when SNR is insufficient.