paper
Training-Free Looped Transformers
paperactiveprovisional
training-free-looped-transformers-5b2282fe·1 events·first seen 22d agoAliases: Training-Free Looped Transformers
Co-occurring entities
More like this (12)
Fixed-Point Reasoners: Stable and Adaptive Deep Looped TransformersDynamic Short Convolutions Improve TransformersVariable-Width TransformersSparse TransformerTRL (Transformer Reinforcement Learning)transformer-based neural rendererTransformer Language Modelsfeed-forward transformerSwift TransformersGraph TransformerLayer Loopingtransformer architecture
Recent events (1)
Training-Free Looped Transformers: Inference-Time Recurrence via ODE-Motivated Layer Reapplication
The paper introduces a method to retrofit recurrence onto frozen pretrained transformer checkpoints at inference time by looping a contiguous mid-stack block of layers without any fine-tuning or architectural changes. Naive block reapplication degrades performance, so the authors motivate their approach by treating pre-norm transformer blocks as forward Euler ODE steps and replacing one large update with smaller damped sub-steps. Evaluated across seven model families including dense, sparse MoE, and MLA+MoE architectures, the method yields consistent benchmark improvements (e.g., +2.64 pp on MMLU-Pro for Qwen3-4B-Instruct) at no training cost.