technique
Thinformer
techniqueactiveprovisional
thinformer-05831a45·1 events·first seen 7d agoAliases: Thinformer
Co-occurring entities
More like this (12)
Recent events (1)
Express: Efficient causal attention approximation with formal guarantees and FlashAttention 2 speedups
A new tool called Express converts non-causal attention approximations into causal ones with matching theoretical guarantees, achieving log^(3/2)(n)/s approximation error with O(s) memory. Combined with the Thinformer approximation and an I/O-aware Triton implementation, it demonstrates substantial speedups over FlashAttention 2. The work targets four practical bottlenecks: long-context prefill, KV cache compression, and both memory- and compute-constrained long-form decoding.