Almanac
paper

Mechanism-Driven Monitors for Preemptive Detection of LLM Training Instability

paperactiveprovisionalmechanism-driven-monitors-for-preemptive-detection-of-llm-training-instability-7bcc5a3e·1 events·first seen 38h ago

Aliases: Mechanism-Driven Monitors for Preemptive Detection of LLM Training Instability

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·38h ago·source ↗

Mechanism-driven internal monitors detect LLM training instability thousands of steps before loss divergence

A new arXiv preprint proposes mechanism-driven monitoring signals derived from the functional roles of critical modules (low-precision flash attention, MoE routers) to detect training instability before it manifests in loss or gradient norms. The authors derive monitors such as spectral entropy of a QK bilinear decomposition and MoE router indicators, showing via fault-injection experiments that these signals trigger thousands of steps ahead of loss divergence. The work targets a high-cost failure mode in frontier LLM training where instability can persist undetected for thousands of steps on expensive accelerator fleets.