paper

Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

paperactiveprovisionalnatural-ungrokking-asymmetric-control-of-which-rules-survive-pretraining-2da9f41e·1 events·first seen 3d ago

Aliases: Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

Co-occurring entities

Pythia

More like this (12)

Alternating Token-Weighted Unlearning Backdoor Unlearning Generalization: A Path Toward the Removal of Unknown Triggers in LLMs ExpRL: Exploratory RL for LLM Mid-Training Layer-Adaptive Expert Pruning Self-Supervised Pretraining rule-based reinforcement learning rewards Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization temporally ordered pre-training Provenance-Grounded Gating and Adaptive Recovery in Synthetic Post-Training Data Curation Production-Rule Traversal Entropy-Regularized Reinforcement Learning shielded reinforcement learning

Recent events (1)

7arXiv · cs.AI·3d ago·source ↗

Natural Ungrokking: Pretraining Can Silently Erase Learned Rules Without Loss Signal

A new arXiv preprint documents a phenomenon called 'natural ungrokking,' in which small language models learn a generalizable rule mid-pretraining (e.g., pronoun-gender agreement) and then lose it entirely by later steps, with no trace in the loss curve. The key predictor of rule survival is corpus support frequency — how often the training stream shows the rule winning over competing surface patterns. Critically, the forgetting is asymmetric: targeted data edits can destroy a rule on demand, but injecting up to 450x the sustaining support level cannot restore it. The findings are validated on public Pythia checkpoints and were pre-registered before data collection.

Evaluation and Benchmarking AI Safety Research Pythia Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining +1 more