paper
Reinforcement Learning from Rich Feedback with Distributional DAgger
paperactiveprovisional
reinforcement-learning-from-rich-feedback-with-distributional-dagger-687e5da7·1 events·first seen 13d agoAliases: Reinforcement Learning from Rich Feedback with Distributional DAgger
Co-occurring entities
More like this (12)
Reinforcement Learning from Human Feedbackreinforcement learning from verifier feedbackRLIF (Reinforcement Learning from Internal Feedback)Reinforcement Learning for CodeUsing Reward Uncertainty to Induce Diverse Behaviour in Reinforcement LearningMulti-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided DispatchReinforcement LearningEntropy-Regularized Reinforcement LearningLearning from the Self-future: On-policy Self-distillation for dLLMsdecoupled reinforcement learningHierarchical Reinforcement LearningHierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes
Recent events (1)
DistIL: Distributional DAgger for RL from Rich Feedback beyond single-bit rewards
A new arXiv preprint introduces DistIL, a distributional variant of the DAgger imitation learning algorithm designed to exploit rich feedback signals (execution traces, tool outputs, expert corrections) rather than the single-bit correctness reward used in standard RLVR. The method uses a forward cross-entropy objective that provides monotonic policy improvement guarantees, unlike reverse KL or Jensen-Shannon divergence objectives used in prior self-distillation approaches. Empirically, DistIL outperforms RLVR and self-distillation baselines on scientific reasoning, coding, and hard math benchmarks.