Entity · paper

Training-Free Looped Transformers

paperactivetraining-free-looped-transformers-5b2282fe·1 events·first seen May 25, 2026

Aliases: Training-Free Looped Transformers

Co-occurring entities

CommonsenseQA OpenBookQA Forward Euler ODE Qwen3-30B-A3B-Instruct MMLU-Pro Mixture of Experts Qwen3-4B-Instruct Moonlight-16B-A3B-Instruct

More like this (12)

Looped Transformer Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers Dynamic Short Convolutions Improve Transformers Variable-Width Transformers Bridging the Gap Between Latent and Explicit Reasoning with Looped Transformers Sparse Transformer The Key to Going Linear: Analysis-Driven Transformer Linearization ktransformers permutation-invariant transformers TRL (Transformer Reinforcement Learning)Invariant Learning Dynamics of Transformers in Inductive Reasoning Tasks Invariant Learning Dynamics of Transformers in Inductive Reasoning Tasks

Recent events (1)

6arXiv · cs.LG·May 25, 2026·source ↗

Training-Free Looped Transformers: Inference-Time Recurrence via ODE-Motivated Layer Reapplication

The paper introduces a method to retrofit recurrence onto frozen pretrained transformer checkpoints at inference time by looping a contiguous mid-stack block of layers without any fine-tuning or architectural changes. Naive block reapplication degrades performance, so the authors motivate their approach by treating pre-norm transformer blocks as forward Euler ODE steps and replacing one large update with smaller damped sub-steps. Evaluated across seven model families including dense, sparse MoE, and MLA+MoE architectures, the method yields consistent benchmark improvements (e.g., +2.64 pp on MMLU-Pro for Qwen3-4B-Instruct) at no training cost.

Frontier Model Releases Inference Economics CommonsenseQA OpenBookQA Forward Euler ODE +6 more