Entity · technique

DAgger

techniqueactivedagger-08f768c1·1 events·first seen Jun 4, 2026

Aliases: DAgger

Co-occurring entities

DistIL Reinforcement Learning with Verifiable Rewards Reinforcement Learning from Rich Feedback with Distributional DAgger

More like this (12)

AGDO Daggr DAPO PDrop DiGress Digit DAAM DAIS page-agent AudioDER DADiff AG-UI

Recent events (1)

6arXiv · cs.AI·Jun 4, 2026·source ↗

DistIL: Distributional DAgger for RL from Rich Feedback beyond single-bit rewards

A new arXiv preprint introduces DistIL, a distributional variant of the DAgger imitation learning algorithm designed to exploit rich feedback signals (execution traces, tool outputs, expert corrections) rather than the single-bit correctness reward used in standard RLVR. The method uses a forward cross-entropy objective that provides monotonic policy improvement guarantees, unlike reverse KL or Jensen-Shannon divergence objectives used in prior self-distillation approaches. Empirically, DistIL outperforms RLVR and self-distillation baselines on scientific reasoning, coding, and hard math benchmarks.

Frontier Model Releases Alignment and RLHF DAgger DistIL Reinforcement Learning with Verifiable Rewards +1 more