paper
Learning from the Self-future: On-policy Self-distillation for dLLMs
paperactiveprovisional
learning-from-the-self-future-on-policy-self-distillation-for-dllms-ad5db843·1 events·first seen 3h agoAliases: Learning from the Self-future: On-policy Self-distillation for dLLMs
Co-occurring entities
More like this (12)
on-policy self-distillationon-policy distillationDense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy DistillationBe My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer FeedbackExpRL: Exploratory RL for LLM Mid-TrainingThe Role of Feedback Alignment in Self-DistillationOn-Policy Co-DistillationContinual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMsJanus: A Benchmark for Goal-Conditioned Information Distortion in LLMsBackdoor Unlearning Generalization: A Path Toward the Removal of Unknown Triggers in LLMsAttention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix ItScaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier
Recent events (1)
d-OPSD: First on-policy self-distillation framework tailored for diffusion LLMs
Researchers introduce d-OPSD, the first on-policy self-distillation (OPSD) framework designed specifically for diffusion large language models (dLLMs). The method addresses a fundamental mismatch between existing autoregressive OPSD approaches and dLLMs' arbitrary-order generation by using suffix conditioning on self-generated answers and step-level rather than token-level divergence supervision. Across four reasoning benchmarks, d-OPSD outperforms RLVR and SFT baselines while requiring only ~10% of the optimization steps of RLVR, suggesting strong sample efficiency gains for dLLM post-training.