technique
DOPD
techniqueactiveprovisional
dopd-a6cdb229·1 events·first seen 12h agoAliases: DOPD
More like this (12)
Recent events (1)
DOPD: Advantage-aware dual on-policy distillation to address privilege illusion in LLM/VLM training
Researchers introduce DOPD (Dual On-policy Distillation), a knowledge distillation framework that dynamically routes token-level supervision between a privileged teacher and privileged student policy based on advantage gap and relative probabilities. The method addresses a failure mode called 'privilege illusion,' where information asymmetry between teacher and student is conflated with a transferable capability gap. Experiments on both LLM and VLM settings show DOPD outperforms vanilla on-policy distillation and related methods, with additional gains on stability, continual learning, and out-of-distribution tasks.