positional attention heads
positional-attention-heads-68502023·1 events·first seen 16d agoAliases: positional attention heads
Co-occurring entities
More like this (12)
Recent events (1)
Positional vs. Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization
Researchers train a decoder-only Transformer (GPT-J) on two structurally equivalent multi-hop reasoning tasks to study how attention heads specialize into positional or symbolic roles during learning. They find that successful task learning correlates with the emergence of 'pure' heads—exclusively positional or symbolic—and provide theoretical constructions showing how single-layer RoPE-based attention realizes these functions geometrically. A novel 'discrepancy' metric formalizes the robustness difference between the two head types, with symbolic mechanisms shown to extrapolate more reliably to longer sequences than positional ones. The findings have implications for understanding length generalization failures in RoPE-based models.