Almanac
paper

Training-Free Looped Transformers

paperactiveprovisionaltraining-free-looped-transformers-5b2282fe·1 events·first seen 22d ago

Aliases: Training-Free Looped Transformers

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.LG·22d ago·source ↗

Training-Free Looped Transformers: Inference-Time Recurrence via ODE-Motivated Layer Reapplication

The paper introduces a method to retrofit recurrence onto frozen pretrained transformer checkpoints at inference time by looping a contiguous mid-stack block of layers without any fine-tuning or architectural changes. Naive block reapplication degrades performance, so the authors motivate their approach by treating pre-norm transformer blocks as forward Euler ODE steps and replacing one large update with smaller damped sub-steps. Evaluated across seven model families including dense, sparse MoE, and MLA+MoE architectures, the method yields consistent benchmark improvements (e.g., +2.64 pp on MMLU-Pro for Qwen3-4B-Instruct) at no training cost.