Almanac
paper

Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

paperactiveprovisionalnatural-ungrokking-asymmetric-control-of-which-rules-survive-pretraining-2da9f41e·1 events·first seen 3d ago

Aliases: Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

Co-occurring entities

More like this (12)

Recent events (1)

7arXiv · cs.AI·3d ago·source ↗

Natural Ungrokking: Pretraining Can Silently Erase Learned Rules Without Loss Signal

A new arXiv preprint documents a phenomenon called 'natural ungrokking,' in which small language models learn a generalizable rule mid-pretraining (e.g., pronoun-gender agreement) and then lose it entirely by later steps, with no trace in the loss curve. The key predictor of rule survival is corpus support frequency — how often the training stream shows the rule winning over competing surface patterns. Critically, the forgetting is asymmetric: targeted data edits can destroy a rule on demand, but injecting up to 450x the sustaining support level cannot restore it. The findings are validated on public Pythia checkpoints and were pre-registered before data collection.