Entity · paper

Knowledge Editing in Masked Diffusion Language Models

paperactiveknowledge-editing-in-masked-diffusion-language-models-50ec3b05·1 events·first seen Jun 3, 2026

Aliases: Knowledge Editing in Masked Diffusion Language Models

Co-occurring entities

More like this (12)

Mask-Aware Policy Gradients for Diffusion Language Models Induction in Both Directions: A Mechanistic Analysis of In-Context Learning in Masked Diffusion Language Models Masked Diffusion Models Self-Augmenting Retrieval for Diffusion Language Models Beyond Fully Random Masking: Attention-Guided Denoising and Optimization for Diffusion Language Models Accelerating Masked Diffusion Large Language Models: A Survey of Efficient Inference Techniques Diffusion Language Models knowledge editing LESS: Mutual-Stability Sampling for Diffusion Language Models Adaptive Multi-Step Lookahead Decoding for Diffusion Language Models What Does a Discrete Diffusion Model Learn?Denoising Diffusion Probabilistic Models

Recent events (1)

5arXiv · cs.CL·Jun 3, 2026·source ↗

Knowledge editing via locate-then-edit transferred to masked diffusion language models, revealing multi-token failure mode

A new arXiv paper investigates whether locate-then-edit knowledge editing methods, developed for autoregressive models, transfer to masked diffusion language models (MDMs) such as LLaDA and Dream. The authors find that causal tracing identifies the same early-to-mid-layer MLP location in both paradigms, but MDMs degrade systematically on multi-token edits due to partially unmasked intermediate states that the edit was never optimized for. A correction targeting these intermediate states substantially restores multi-token editing performance. The work is the first systematic comparison of knowledge editing across autoregressive and diffusion-based language model paradigms.

Evaluation and Benchmarking Open Weights Progress Knowledge Editing in Masked Diffusion Language Models Qwen Llama +2 more