paper
LESS: Mutual-Stability Sampling for Diffusion Language Models
paperactiveprovisional
less-mutual-stability-sampling-for-diffusion-language-models-38d141e7·1 events·first seen 30h agoAliases: LESS: Mutual-Stability Sampling for Diffusion Language Models
Co-occurring entities
More like this (12)
Diffusion Language ModelsSelf-Augmenting Retrieval for Diffusion Language Modelscontinuous diffusion language modelBeyond Fully Random Masking: Attention-Guided Denoising and Optimization for Diffusion Language ModelsKnowledge Editing in Masked Diffusion Language Modelsdiffusion posterior samplingdiscrete diffusion modelsThe Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of Large Language ModelsKolmogorov Regression for Robust Diffusion PoliciesMasked Diffusion ModelsSurvival Diffusion Probabilistic Model (SDPM)Denoising Diffusion Probabilistic Models
Recent events (1)
LESS: Adaptive mutual-stability sampling cuts diffusion LLM decoding steps by 72%
Researchers introduce LESS, a training-free adaptive sampler for diffusion large language models that treats token commitment as an online stopping problem. The method uses a joint stability rule combining confidence, persistence, and distributional stability to decide when to unmask tokens, avoiding wasted computation on already-stable positions. Evaluated on Dream-7B, LLaDA-8B, and LLaDA-1.5-8B across seven benchmarks, LESS reduces reverse denoising steps by 72.1% versus fixed-budget decoding while improving accuracy over prior adaptive samplers. The step reductions translate directly to fewer Transformer forward passes and lower wall-clock latency.