technique
On-Policy Co-Distillation
techniqueactiveprovisional
on-policy-co-distillation-82ee8218·1 events·first seen 2d agoAliases: On-Policy Co-Distillation
Co-occurring entities
More like this (12)
On-Policy Distillation (OPD)on-policy distillationon-policy self-distillationCanonical-Context On-Policy Distillation (CCOPD)Be My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer FeedbackDense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy DistillationLearning from the Self-future: On-policy Self-distillation for dLLMsDenoising Diffusion Policy OptimizationWeak-to-Strong Distillationensemble distillationGeneralized DistillationPreference Coordinated Multi-agent Policy Optimization
Recent events (1)
OPCoD: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback
Researchers introduce On-Policy Co-Distillation (OPCoD), a training framework where two LLMs, each stronger in a different domain, iteratively tutor each other using on-policy rollouts and peer feedback. The method uses cognizance-based gating to control when feedback is given and feedback anchoring to ground it in the problem context. On Science Q&A tasks, OPCoD achieves Pareto improvement for both models across all evaluated domain pairs, outperforming one-way distillation and single-model fine-tuning baselines.