Entity · paper

Be My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback

paperactivebe-my-tutor-on-policy-co-distillation-for-mutual-llm-improvement-via-peer-feedback-15959e09·1 events·first seen Jun 15, 2026

Aliases: Be My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback

Co-occurring entities

On-Policy Co-Distillation

More like this (12)

MOPD: Multi-Teacher On-Policy Distillation for Capability Integration in LLM Post-Training On-Policy Co-Distillation Multi-Teacher On-Policy Distillation Learning to Prompt: Improving Student Engagement with Adaptive LLM-based High-School Tutoring Learning from the Self-future: On-policy Self-distillation for dLLMs On-Policy Distillation for LLM Safety: A Routing Approach to Template-Robust Realignment LLM-as-a-Coach: Experiential Learning for Non-Verifiable Tasks Can LLMs Judge Better Than They Generate? Evaluating Task Asymmetry, Mechanistic Interpretability and Transferability for In-Context QA Pass the Baton: Trajectory-Relayed On-Policy Distillation Teaching LLMs to Self-Evolve: Cultivating Core Meta-Skills with Reinforcement Learning MyMentorLLM Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions

Recent events (1)

5arXiv · cs.CL·Jun 15, 2026·source ↗

OPCoD: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback

Researchers introduce On-Policy Co-Distillation (OPCoD), a training framework where two LLMs, each stronger in a different domain, iteratively tutor each other using on-policy rollouts and peer feedback. The method uses cognizance-based gating to control when feedback is given and feedback anchoring to ground it in the problem context. On Science Q&A tasks, OPCoD achieves Pareto improvement for both models across all evaluated domain pairs, outperforming one-way distillation and single-model fine-tuning baselines.

Evaluation and Benchmarking Alignment and RLHF On-Policy Co-Distillation Be My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback