Entity · paper

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

paperactiveattention-amnesia-in-hybrid-llms-when-cot-fine-tuning-breaks-long-range-recall-and-how-to-fix-it-989cae93·1 events·first seen Jun 10, 2026

Aliases: Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Co-occurring entities

Jet-Nemotron Needle-in-a-Haystack HypeNet QK-Restore

More like this (12)

Recalling Too Well: Sycophancy Evaluation and Mitigation in Memory-Augmented Models FlashbackCL: Mitigating Temporal Forgetting in Federated Learning Towards Root Memories: Benchmarking and Enhancing Implicit Logical Memory Retrieval for Personalized LLMs Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories Towards Mechanistically Understanding Why Memorized Knowledge Fails to Generalize in Large Language Model Finetuning Extending LLM Context via Associative Recurrent Memory Expert-Aware Causal Tracing of Factual Recall in Sparse MoE Language Models Backdoor Unlearning Generalization: A Path Toward the Removal of Unknown Triggers in LLMs Retrieval-Augmented Fine-Tuning Holonomy Memory Reinforcement Learning A sleep-like consolidation mechanism for LLMs CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference

Recent events (1)

6arXiv · cs.CL·Jun 10, 2026·source ↗

QK-Restore: Fixing long-context recall degradation caused by CoT fine-tuning in hybrid LLMs

Researchers find that chain-of-thought supervised fine-tuning systematically degrades long-context recall in hybrid linear-attention models (HypeNet, Jet-Nemotron), with Needle-In-A-Haystack performance collapsing dramatically—e.g., HypeNet-9B dropping from 67.2% to 9.4% at 256K context. The root cause is identified as CoT-SFT biasing attention gradients toward short-range patterns, corrupting the query-key projections responsible for long-range routing. The paper proposes QK-Restore, a training-free fix that restores only W_Q and W_K from the pre-SFT checkpoint, recovering long-context capability while preserving reasoning gains.

Long Context Evolution Alignment and RLHF Jet-Nemotron Needle-in-a-Haystack HypeNet +2 more