Entity · paper

Cross-Layer Sparse Attention with Shared Routing

paperactivecross-layer-sparse-attention-with-shared-routing-362201e4·1 events·first seen Jun 5, 2026

Aliases: Cross-Layer Sparse Attention with Shared Routing, You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

Co-occurring entities

CLSA YOCO Cross-Layer Sparse Attention

More like this (12)

Cross-Layer Sparse Attention Block Sparse Attention ProbSparse Attention MiniMax Sparse Attention sparse attention Graph Attention Network Graph Convolutional Attention Spend Experts Where You Are Unsure: Confidence-Adaptive Routing for Mixture-of-Experts LoRA Locality-Sensitive Hashing Attention Graph Sparse Sampling DeepSeek Sparse Attention Multi-head Latent Attention (MLA)

Recent events (1)

6arXiv · cs.CL·Jun 5, 2026·source ↗

CLSA: Cross-Layer Sparse Attention with Shared Routing for Efficient Long-Context Inference

Researchers propose Cross-Layer Sparse Attention (CLSA), a method that builds on KV-sharing architectures (like YOCO) to share both the KV cache and the routing index across decoder layers. A single indexer computes token-level top-k selection once and reuses it across layers, reducing routing overhead while preserving fine-grained selectivity. Experiments on short- and long-context benchmarks show up to 7.6x decoding speedup and 17.1x overall throughput improvement at 128K context, addressing pre-filling, KV-cache storage, and decoding bottlenecks simultaneously.

Long Context Evolution Inference Economics CLSA YOCO Cross-Layer Sparse Attention with Shared Routing +1 more