Entity · technique

d-OPSD

techniqueactived-opsd-7b0fb2e6·1 events·first seen Jun 17, 2026

Aliases: d-OPSD

Co-occurring entities

Learning from the Self-future: On-policy Self-distillation for dLLMs

More like this (12)

OPSD FastOPD Progressive OPD (POPD)DOPD DanceOPD Truncated OPD (TOPD)SDPO DPPO X³-OPD Vision-OPD DDPO DPOT

Recent events (1)

5arXiv · cs.CL·Jun 17, 2026·source ↗

d-OPSD: First on-policy self-distillation framework tailored for diffusion LLMs

Researchers introduce d-OPSD, the first on-policy self-distillation (OPSD) framework designed specifically for diffusion large language models (dLLMs). The method addresses a fundamental mismatch between existing autoregressive OPSD approaches and dLLMs' arbitrary-order generation by using suffix conditioning on self-generated answers and step-level rather than token-level divergence supervision. Across four reasoning benchmarks, d-OPSD outperforms RLVR and SFT baselines while requiring only ~10% of the optimization steps of RLVR, suggesting strong sample efficiency gains for dLLM post-training.

Frontier Model Releases Alignment and RLHF d-OPSD Learning from the Self-future: On-policy Self-distillation for dLLMs