Entity · technique

DeepSeek Sparse Attention

techniqueactivedeepseek-sparse-attention-c74b4eaf·3 events·first seen May 18, 2026

Aliases: DeepSeek Sparse Attention, DeepSeek Sparse Attention (DSA)

Co-occurring entities

DeepSeek V4 GLM-5.1 PIVOT RULER LongBench v2 DeepSeek-V4-Flash Claude Code Gemini-3.1-Pro HuggingFace DeepSeek API TileLang DeepSeek-V3.1-Terminus

More like this (12)

sparse attention Block Sparse Attention ProbSparse Attention DeepSeek API MiniMax Sparse Attention DeepSeek V4 DeepSeek-V3.1-Base DeepSeek-V2.5-1210 DeepSeek-Math-V2 DeepSeek-V3-0324 DeepSeek-V4-Pro-DSpark Cross-Layer Sparse Attention

Recent events (3)

6arXiv · cs.CL·4d ago·source ↗

PIVOT: Training-free sparse attention indexer cuts DeepSeek-V3.2 latency by up to 1.6x

PIVOT (Proxy Indexing Via One full-prefix Traversal) is a training-free drop-in replacement for the DeepSeek Sparse Attention (DSA) indexer that reduces the O(L²) per-query scan cost by grouping nearby queries and sharing a single prefix scan across the group. Two variants (PIVOT-Reuse and PIVOT-Refine) trade speed for fidelity, with PIVOT-Refine matching dense indexer accuracy. Evaluated on DeepSeek-V3.2 and GLM-5.1 across LongBench and RULER, PIVOT accelerates the indexer by up to 4x and reduces end-to-end latency by up to 1.6x at long context.

Long Context Evolution Inference Economics DeepSeek V4 GLM-5.1 DeepSeek Sparse Attention +3 more

9Deepseek News·May 19, 2026·source ↗

DeepSeek V4 Preview Release: 1.6T-param Pro and 284B Flash Models with 1M Context, Open-Sourced

DeepSeek has released DeepSeek-V4 as an open-weights preview, comprising two MoE variants: V4-Pro (1.6T total / 49B active parameters) and V4-Flash (284B total / 13B active parameters). Both models support 1M token context by default, enabled by a novel Token-wise compression and DeepSeek Sparse Attention (DSA) architecture. V4-Pro claims open-source SOTA on agentic coding benchmarks and world-class math/STEM/coding performance rivaling top closed-source models, while V4-Flash offers near-parity reasoning at lower cost and latency. The API is live today with OpenAI and Anthropic compatibility, and legacy model endpoints will be retired in July 2026.

Long Context Evolution Frontier Model Releases DeepSeek V4 DeepSeek-V4-Flash Claude Code +7 more

8Deepseek News·May 18, 2026·source ↗

DeepSeek Releases V3.2-Exp with Sparse Attention Architecture and 50%+ API Price Cut

DeepSeek has released DeepSeek-V3.2-Exp, an experimental model built on V3.1-Terminus that introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve long-context performance and reduce compute costs during training and inference. Benchmarks indicate V3.2-Exp performs on par with V3.1-Terminus while achieving efficiency gains. The release is accompanied by a 50%+ API price reduction effective immediately, open-weights release on Hugging Face, a technical report, and GPU kernel code in TileLang and CUDA.

Training Infrastructure Long Context Evolution DeepSeek API DeepSeek V4 TileLang +5 more