paper
Beyond Fully Random Masking: Attention-Guided Denoising and Optimization for Diffusion Language Models
paperactiveprovisional
beyond-fully-random-masking-attention-guided-denoising-and-optimization-for-diffusion-language-models-f5942cd2·1 events·first seen 6d agoAliases: Beyond Fully Random Masking: Attention-Guided Denoising and Optimization for Diffusion Language Models
Co-occurring entities
More like this (12)
Denoising Diffusion Probabilistic ModelsDenoising Diffusion Policy OptimizationKnowledge Editing in Masked Diffusion Language ModelsSelf-Augmenting Retrieval for Diffusion Language ModelsLESS: Mutual-Stability Sampling for Diffusion Language ModelsDiffusion Language ModelsDirectAudioEdit: Inversion-Free Text-Guided Audio Editing via Diffusion Prediction ContrastMasked Diffusion ModelsA Diffusion Approximation for Temporal-Difference Learning with Linear Features under Markovian Noisecontinuous diffusion language modelLanguage Model FinetuningListening with Attention: Entropy-Guided Explainability for Transformer-Based Audio Models
Recent events (1)
AGDO: Attention-guided denoising and optimization framework improves diffusion language model reasoning
Researchers propose AGDO, a framework that replaces random masking in diffusion large language models (dLLMs) with attention-guided denoising order and token weighting during fine-tuning and reinforcement learning. The work is motivated by an empirical finding that tokens with stronger attention to unmasked context are more stable and critical for reasoning. Experiments on math and coding benchmarks show AGDO outperforms existing post-training methods for dLLMs, advancing the case for attention-aware training in parallel-decoding language models.