logit-contribution-scoring-15379b50·1 events·first seen Aliases: Logit-Contribution Scoring
A new arXiv preprint introduces Logit-Contribution Scoring (LOCOS), a method for identifying attention heads responsible for non-literal retrieval in long-context LLMs — cases where models synthesize answers from meaning rather than copying tokens verbatim. Existing detectors fail at this task because they rely on a literal-copy criterion that misses the output-value (OV) circuit mechanism. Evaluated across Qwen3, Gemma-3, and OLMo-3.1, LOCOS outperforms prior attention-based detectors on the NoLiMa benchmark, with ablation of 50 heads on Qwen3-8B collapsing ROUGE-L from 0.401 to 0.000 while the best baseline retains 0.292. The identified heads are retrieval-specific, leaving parametric recall and arithmetic reasoning unaffected.