Late-Stage LoRA
late-stage-lora-35206a61·1 events·first seen 25d agoAliases: Late-Stage LoRA
Co-occurring entities
More like this (12)
Recent events (1)
Hyperfitting Explained: Terminal Geometric Expansion in Final Transformer Layers Drives Diversity Gains
This paper investigates the 'hyperfitting' phenomenon—where fine-tuning LLMs to near-zero loss on small datasets improves open-ended generation and reduces repetition—and demonstrates it is mechanistically distinct from temperature scaling. Entropy-matched control experiments falsify both the temperature-equivalence and static vocabulary reweighting hypotheses, instead localizing the effect to a 'Terminal Expansion' in the final transformer block where feature-space dimensionality expands by ~80.8 dimensions, enabling promotion of deep-tail tokens via context-dependent rank reordering. The authors introduce Late-Stage LoRA, a targeted fine-tuning strategy updating only the final 5 layers, achieving robust generation with minimal parameter updates.