Rotary Position Embedding (RoPE)
rotary-position-embedding-rope--5f483984·3 events·first seen 28d agoAliases: Rotary Position Embedding (RoPE), RoPE (Rotary Position Embedding), 4D Rotary Position Embedding
Co-occurring entities
More like this (12)
Recent events (3)
You Could Have Designed State of the Art Positional Encoding
A Hugging Face blog post walks through the design space of positional encoding for transformer models, building intuition for why modern schemes like RoPE emerged. The post takes a pedagogical approach, showing how one could derive state-of-the-art positional encoding from first principles. It covers the evolution from absolute to relative positional encodings and the properties that make certain schemes preferable for long-context generalization.
Positional vs. Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization
Researchers train a decoder-only Transformer (GPT-J) on two structurally equivalent multi-hop reasoning tasks to study how attention heads specialize into positional or symbolic roles during learning. They find that successful task learning correlates with the emergence of 'pure' heads—exclusively positional or symbolic—and provide theoretical constructions showing how single-layer RoPE-based attention realizes these functions geometrically. A novel 'discrepancy' metric formalizes the robustness difference between the two head types, with symbolic mechanisms shown to extrapolate more reliably to longer sequences than positional ones. The findings have implications for understanding length generalization failures in RoPE-based models.
Apple's AToken: A Unified Multimodal Tokenizer and Encoder for Images, Videos, and 3D Objects
Apple researchers introduced AToken, a transformer model with a single 4D tokenizer and encoder-decoder architecture that handles images, videos, and 3D objects in a shared token space. The model is trained to both reconstruct and classify all three media types, using a pretrained SigLIP2 vision encoder extended to four dimensions with 4D Rotary Position Embedding. AToken approaches or matches specialized models on image classification (82.2% ImageNet), image generation (0.21 rFID), and 3D reconstruction (28.28 PSNR), while remaining competitive on video tasks. The work addresses a longstanding tension between generation-focused and classification-focused encoders by forcing embeddings to retain both fine visual detail and semantic content.