Almanac
benchmark

multi-hop reasoning

benchmarkactiveprovisionalmulti-hop-reasoning-f30191a3·1 events·first seen 16d ago

Aliases: multi-hop reasoning

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.LG·16d ago·source ↗

Positional vs. Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization

Researchers train a decoder-only Transformer (GPT-J) on two structurally equivalent multi-hop reasoning tasks to study how attention heads specialize into positional or symbolic roles during learning. They find that successful task learning correlates with the emergence of 'pure' heads—exclusively positional or symbolic—and provide theoretical constructions showing how single-layer RoPE-based attention realizes these functions geometrically. A novel 'discrepancy' metric formalizes the robustness difference between the two head types, with symbolic mechanisms shown to extrapolate more reliably to longer sequences than positional ones. The findings have implications for understanding length generalization failures in RoPE-based models.