benchmark
Countdown-Stepwise
benchmarkactiveprovisional
countdown-stepwise-50486b5f·1 events·first seen 16d agoAliases: Countdown-Stepwise
Co-occurring entities
More like this (12)
Recent events (1)
ReuseRL: Skill Reuse as Compression in Agentic RL via MDL Principle
ReuseRL formalizes agentic reinforcement learning through the Minimum Description Length (MDL) principle, extracting a shared skill dictionary from successful trajectories and augmenting the RL objective with a segmentation cost that penalizes idiosyncratic, non-reusable behaviors. The authors prove a PAC-Bayes generalization bound for this compression penalty. Evaluated on ALFWorld, TextWorld-Cooking, and Countdown-Stepwise, ReuseRL outperforms vanilla GRPO and round-length baselines on both in-distribution and out-of-distribution tasks.