Entity · paper

Variable-Width Transformers

paperactivevariable-width-transformers-20799a4c·1 events·first seen Jun 17, 2026

Aliases: Variable-Width Transformers

Co-occurring entities

More like this (12)

Looped Transformer Training-Free Looped Transformers transformer architecture Dynamic Short Convolutions Improve Transformers ktransformers Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers Sparse-structure Multimodal Diffusion Transformer Swift Transformers ViT (Vision Transformer)autoregressive transformer feed-forward transformer MDTransformer

Recent events (1)

5arXiv · cs.CL·Jun 17, 2026·source ↗

Variable-Width Transformers: X-shaped architecture outperforms uniform-width baselines with 22% fewer FLOPs

Researchers propose the ><former (X-shaped transformer), a decoder-only architecture that uses wider early and late layers with narrower middle layers, implemented via a parameter-free residual resizing mechanism. Evaluated on models from 200M to 2B dense parameters and 3B MoE, the architecture consistently outperforms parameter-matched uniform-width baselines on language modeling loss. The design yields a 22% reduction in FLOPs and 15% reduction in KV cache memory under fitted scaling curves, suggesting nonuniform width allocation is a viable path to more compute-efficient language models.

Frontier Model Releases Inference Economics Q-Former Variable-Width Transformers