Entity · model

Moonlight-16B-A3B-Instruct

modelactivemoonlight-16b-a3b-instruct-bdcaf460·1 events·first seen May 25, 2026

Aliases: Moonlight-16B-A3B-Instruct

Co-occurring entities

CommonsenseQA OpenBookQA Forward Euler ODE Training-Free Looped Transformers Qwen3-30B-A3B-Instruct MMLU-Pro Mixture of Experts Qwen3-4B-Instruct

More like this (12)

Dream-7B-Instruct Qwen3-30B-A3B-Instruct LLaMA-2-7B-32K-Instruct Llama3-8B-Instruct Qwen2.5-7B-Instruct-1M Qwen3-4B-Instruct Apertus-8B-Instruct-2509 Qwen2-Audio-7B-Instruct Qwen2.5-VL-32B-Instruct Qwen3-Coder-480B-A35B-Instruct LFM2-8B-A1B Llama 3.3 70B Instruct

Recent events (1)

6arXiv · cs.LG·May 25, 2026·source ↗

Training-Free Looped Transformers: Inference-Time Recurrence via ODE-Motivated Layer Reapplication

The paper introduces a method to retrofit recurrence onto frozen pretrained transformer checkpoints at inference time by looping a contiguous mid-stack block of layers without any fine-tuning or architectural changes. Naive block reapplication degrades performance, so the authors motivate their approach by treating pre-norm transformer blocks as forward Euler ODE steps and replacing one large update with smaller damped sub-steps. Evaluated across seven model families including dense, sparse MoE, and MLA+MoE architectures, the method yields consistent benchmark improvements (e.g., +2.64 pp on MMLU-Pro for Qwen3-4B-Instruct) at no training cost.

Frontier Model Releases Inference Economics CommonsenseQA OpenBookQA Forward Euler ODE +6 more