paper
Be My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback
paperactiveprovisional
be-my-tutor-on-policy-co-distillation-for-mutual-llm-improvement-via-peer-feedback-15959e09·1 events·first seen 2d agoAliases: Be My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback
Co-occurring entities
More like this (12)
On-Policy Co-DistillationLearning from the Self-future: On-policy Self-distillation for dLLMsRevising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online DiscussionsOn-Policy Distillation (OPD)Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimizationhuman-LLM collaborative annotationThe Role of Feedback Alignment in Self-DistillationLLM Agent ClassroomLLM-as-a-JudgeLLM PretrainingBeyond Third-Person Audits: Situated Interaction Auditing for User-Centered LLM Bias ResearchContinual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs
Recent events (1)
OPCoD: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback
Researchers introduce On-Policy Co-Distillation (OPCoD), a training framework where two LLMs, each stronger in a different domain, iteratively tutor each other using on-policy rollouts and peer feedback. The method uses cognizance-based gating to control when feedback is given and feedback anchoring to ground it in the problem context. On Science Q&A tasks, OPCoD achieves Pareto improvement for both models across all evaluated domain pairs, outperforming one-way distillation and single-model fine-tuning baselines.