Kimi Delta Attention
kimi-delta-attention-a6caf656·2 events·first seen 26d agoAliases: Kimi Delta Attention
Co-occurring entities
More like this (12)
Recent events (2)
Gated DeltaNet-2: Decoupling Erase and Write Gates in Linear Attention
Gated DeltaNet-2 is a new linear attention architecture from NVIDIA Labs that separates the erase and write operations in the delta-rule update into independent channel-wise gates, generalizing both Gated DeltaNet and Kimi Delta Attention (KDA). The model introduces a chunkwise WY algorithm with channel-wise decay and a gate-aware backward pass for efficient parallel training. At 1.3B parameters trained on 100B FineWeb-Edu tokens, it outperforms Mamba-2, Gated DeltaNet, KDA, and Mamba-3 variants on language modeling, commonsense reasoning, and long-context RULER needle-in-a-haystack retrieval benchmarks. Code is publicly released via NVlabs on GitHub.
Data Points: Qwen3.7-Max, OpenAI Math Proof, Gated DeltaNet-2, Trump AI Order, Microsoft Fara1.5
This edition of The Batch covers five significant AI developments: Alibaba's Qwen3.7-Max reasoning model with 1M token context and agentic capabilities ranking fifth on the Artificial Analysis Intelligence Index; an OpenAI reasoning model resolving the 80-year-old Erdős planar unit distance problem; Nvidia's Gated DeltaNet-2 outperforming Mamba-3 and other linear attention architectures; Trump pulling back a proposed AI regulation executive order; and Microsoft Research's Fara1.5 computer-use agent family beating OpenAI Operator and Google Gemini on the Online-Mind2Web benchmark.