Entity · model

LLaDA

modelactivellada-4b486dd3·2 events·first seen Jun 1, 2026

Aliases: LLaDA

Co-occurring entities

Knowledge Editing in Masked Diffusion Language Models Qwen Llama Dream BLEU-4 Graph Transformer Diffusion Language Models Graph-LLaDA LAGRANGE lambda-scaled structural decoding

More like this (12)

LLaDA-1.5-8B LLaDA-8B LLaDA-8B-Base LLaVA 1.6 LLaMA-2-13B LLaVA-v1.5-Instruct LLaMA-Omni DLAM Graph-LLaDA MaLoRA LLaMA-2-7B-32K-Instruct LLaVA-1.5-7B

Recent events (2)

5arXiv · cs.CL·Jun 3, 2026·source ↗

Knowledge editing via locate-then-edit transferred to masked diffusion language models, revealing multi-token failure mode

A new arXiv paper investigates whether locate-then-edit knowledge editing methods, developed for autoregressive models, transfer to masked diffusion language models (MDMs) such as LLaDA and Dream. The authors find that causal tracing identifies the same early-to-mid-layer MLP location in both paradigms, but MDMs degrade systematically on multi-token edits due to partially unmasked intermediate states that the edit was never optimized for. A correction targeting these intermediate states substantially restores multi-token editing performance. The work is the first systematic comparison of knowledge editing across autoregressive and diffusion-based language model paradigms.

Evaluation and Benchmarking Open Weights Progress Knowledge Editing in Masked Diffusion Language Models Qwen Llama +2 more

6arXiv · cs.CL·Jun 1, 2026·source ↗

Trajectory Analysis of Masked Diffusion LMs for Graph-to-Text Generation with Lambda-Scaled Structural Decoding

This paper presents the first systematic study of masked diffusion language models (MDLMs) for graph-to-text generation, analyzing the order in which tokens are unmasked during iterative decoding. The authors find MDLMs naturally unmask entities first, then relational/function words, then structural tokens—a pattern disrupted by supervised fine-tuning, which prematurely anchors structural tokens and causes hallucination or omission. They propose lambda-scaled structural decoding, a training-free inference-time fix that recovers +9.4 BLEU-4, and introduce Graph-LLaDA, which integrates a Graph Transformer encoder into LLaDA's decoding process. Cross-dataset evaluation on the LAGRANGE benchmark shows prior baselines overfit to dataset-specific patterns while MDLM-based approaches generalize better.

Frontier Model Releases Evaluation and Benchmarking BLEU-4 Graph Transformer Diffusion Language Models +5 more