Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining
natural-ungrokking-asymmetric-control-of-which-rules-survive-pretraining-2da9f41e·1 events·first seen 3d agoAliases: Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining
Co-occurring entities
More like this (12)
Recent events (1)
Natural Ungrokking: Pretraining Can Silently Erase Learned Rules Without Loss Signal
A new arXiv preprint documents a phenomenon called 'natural ungrokking,' in which small language models learn a generalizable rule mid-pretraining (e.g., pronoun-gender agreement) and then lose it entirely by later steps, with no trace in the loss curve. The key predictor of rule survival is corpus support frequency — how often the training stream shows the rule winning over competing surface patterns. Critically, the forgetting is asymmetric: targeted data edits can destroy a rule on demand, but injecting up to 450x the sustaining support level cannot restore it. The findings are validated on public Pythia checkpoints and were pre-registered before data collection.