model
Moonlight-16B-A3B-Instruct
modelactiveprovisional
moonlight-16b-a3b-instruct-bdcaf460·1 events·first seen 22d agoAliases: Moonlight-16B-A3B-Instruct
Co-occurring entities
More like this (12)
Recent events (1)
Training-Free Looped Transformers: Inference-Time Recurrence via ODE-Motivated Layer Reapplication
The paper introduces a method to retrofit recurrence onto frozen pretrained transformer checkpoints at inference time by looping a contiguous mid-stack block of layers without any fine-tuning or architectural changes. Naive block reapplication degrades performance, so the authors motivate their approach by treating pre-norm transformer blocks as forward Euler ODE steps and replacing one large update with smaller damped sub-steps. Evaluated across seven model families including dense, sparse MoE, and MLA+MoE architectures, the method yields consistent benchmark improvements (e.g., +2.64 pp on MMLU-Pro for Qwen3-4B-Instruct) at no training cost.