Entity · technique

Diffusion Language Models

techniqueactivediffusion-language-models-c5d4052b·5 events·first seen May 23, 2026

Aliases: Diffusion Language Models, discrete diffusion language models, Masked Diffusion Language Models, masked diffusion language model, diffusion large language models

Co-occurring entities

More like this (12)

continuous diffusion language model Self-Augmenting Retrieval for Diffusion Language Models LESS: Mutual-Stability Sampling for Diffusion Language Models Diffusion Models Message Passing Language Models Representation-Conditioned Diffusion Models Random Language Model diffusion-based generative models Masked Diffusion Models Adaptive Multi-Step Lookahead Decoding for Diffusion Language Models discrete diffusion models Mask-Aware Policy Gradients for Diffusion Language Models

Recent events (5)

5arXiv · cs.CL·Jul 17, 2026·source ↗

Mask-Aware Policy Gradients improve RL training for Masked Diffusion Language Models

A new arXiv preprint introduces a two-stage action MDP formalization for applying reinforcement learning to Masked Diffusion Language Models (MDLMs), decomposing the policy gradient into a token prediction term and a masking order term. Prior approaches ignored the position-unmasking decision, leading to intractable log-likelihood estimates; the proposed method optimizes both terms jointly. The approach achieves 87.1% on GSM8K and 53.4% on MBPP, claiming state-of-the-art results for MDLM-based reasoning and coding.

Evaluation and Benchmarking Alignment and RLHF Diffusion Language Models Mask-Aware Policy Gradients for Diffusion Language Models MBPP +1 more

6arXiv · cs.AI·Jun 2, 2026·source ↗

SimSD: Speculative Decoding Adapted for Diffusion Language Models

SimSD introduces a training-free speculative decoding algorithm for diffusion large language models (dLLMs), which previously could not use standard token-level speculative decoding due to their bidirectional attention and masked language modeling formulation. The method uses a plug-and-play masking strategy that introduces reference tokens from a draft model and a custom attention mask, enabling valid logit computation for drafted tokens in a single forward pass. Evaluated on SDAR-family dLLMs across four benchmarks, SimSD achieves up to 7.46x decoding throughput improvement while maintaining or improving generation quality. The approach is compatible with other acceleration techniques such as KV cache and blockwise decoding.

Frontier Model Releases Inference Economics KV Cache speculative decoding SDAR +4 more

6arXiv · cs.CL·Jun 1, 2026·source ↗

Trajectory Analysis of Masked Diffusion LMs for Graph-to-Text Generation with Lambda-Scaled Structural Decoding

This paper presents the first systematic study of masked diffusion language models (MDLMs) for graph-to-text generation, analyzing the order in which tokens are unmasked during iterative decoding. The authors find MDLMs naturally unmask entities first, then relational/function words, then structural tokens—a pattern disrupted by supervised fine-tuning, which prematurely anchors structural tokens and causes hallucination or omission. They propose lambda-scaled structural decoding, a training-free inference-time fix that recovers +9.4 BLEU-4, and introduce Graph-LLaDA, which integrates a Graph Transformer encoder into LLaDA's decoding process. Cross-dataset evaluation on the LAGRANGE benchmark shows prior baselines overfit to dataset-specific patterns while MDLM-based approaches generalize better.

Frontier Model Releases Evaluation and Benchmarking BLEU-4 Graph Transformer Diffusion Language Models +5 more

6arXiv · cs.CL·May 26, 2026·source ↗

Triplet-Block Diffusion RWKV: Unifying Linear-Time Causal Models with Bidirectional Discrete Diffusion

The paper introduces B³D-RWKV, a 7.2B-parameter language model that combines RWKV's O(L) linear-time inference with parallel bidirectional discrete diffusion via a triplet-block layout. This architecture resolves the fundamental tension between causal (unidirectional) and diffusion (bidirectional) attention requirements. On an 8-task evaluation suite, B³D-RWKV-7.2B achieves comparable accuracy to existing models while delivering an average 1.6× decoding throughput speedup over baselines.

Frontier Model Releases Inference Economics Diffusion Language Models B³D-RWKV RWKV +2 more

5Hugging Face Blog·May 23, 2026·source ↗

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models

NVIDIA's Nemotron-Labs introduces diffusion-based language models targeting extremely fast text generation, published as a Hugging Face blog post. The piece covers the approach of using diffusion processes for language modeling as an alternative to autoregressive generation, with a focus on inference speed. This represents a continued push by NVIDIA's research arm into non-autoregressive generation paradigms.

Frontier Model Releases Inference Economics Diffusion Language Models NVIDIA Hugging Face +3 more