Almanac
technique

Masked Diffusion Models

techniqueactiveprovisionalmasked-diffusion-models-2ab3a5c3·1 events·first seen 22d ago

Aliases: Masked Diffusion Models

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.LG·22d ago·source ↗

Looped Diffusion Language Models (LoopMDM): Depth Scaling via Layer Looping

LoopMDM introduces selective looping of early-middle transformer layers in masked diffusion language models, achieving a depth-scaling effect without adding parameters. The approach matches same-size MDM performance with up to 3.3× fewer training FLOPs and outperforms deeper non-looped MDMs on reasoning benchmarks, including up to 8.5 points improvement on GSM8K. Inference-time compute scaling is enabled by varying loop counts, with adaptive loop scheduling providing additional efficiency gains. Attention analysis suggests looping works by promoting interactions among masked token positions.