Entity · paper

Dynamic Short Convolutions Improve Transformers

paperactivedynamic-short-convolutions-improve-transformers-2fec1b82·1 events·first seen Jun 3, 2026

Aliases: Dynamic Short Convolutions Improve Transformers

Co-occurring entities

More like this (12)

Localized Adaptation Reveals Distinct Learning Signatures in Transformers Training-Free Looped Transformers Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers Sparse Transformer Variable-Width Transformers Invariant Learning Dynamics of Transformers in Inductive Reasoning Tasks Invariant Learning Dynamics of Transformers in Inductive Reasoning Tasks Sparse-structure Multimodal Diffusion Transformer permutation-invariant transformers Associative Recurrent Memory Transformer Bridging the Gap Between Latent and Explicit Reasoning with Looped Transformers Graph Transformer

Recent events (1)

6arXiv · cs.CL·Jun 3, 2026·source ↗

Dynamic short convolutions yield 1.33–1.60× compute advantage over standard Transformers

A new arXiv preprint introduces dynamic short convolutions as an architectural primitive for Transformers, using input-dependent filters to combine locality bias with increased expressivity. Experiments across 150M–2B parameter language models show consistent perplexity improvements over standard Transformers and static convolution variants, with scaling-law fits indicating a 1.33× compute advantage when applied to key/query/value vectors and 1.60× when added after every linear layer. The technique also improves linear RNNs (Mamba-2, Gated DeltaNet) and mixture-of-experts architectures, with custom Triton kernels making training practical.

Training Infrastructure Frontier Model Releases Triton Mamba Gated DeltaNet-2 +1 more