technique
DAgger
techniqueactiveprovisional
dagger-08f768c1·1 events·first seen 13d agoAliases: DAgger
Co-occurring entities
More like this (12)
Recent events (1)
DistIL: Distributional DAgger for RL from Rich Feedback beyond single-bit rewards
A new arXiv preprint introduces DistIL, a distributional variant of the DAgger imitation learning algorithm designed to exploit rich feedback signals (execution traces, tool outputs, expert corrections) rather than the single-bit correctness reward used in standard RLVR. The method uses a forward cross-entropy objective that provides monotonic policy improvement guarantees, unlike reverse KL or Jensen-Shannon divergence objectives used in prior self-distillation approaches. Empirically, DistIL outperforms RLVR and self-distillation baselines on scientific reasoning, coding, and hard math benchmarks.