Almanac
benchmark

Countdown-Stepwise

benchmarkactiveprovisionalcountdown-stepwise-50486b5f·1 events·first seen 16d ago

Aliases: Countdown-Stepwise

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.AI·16d ago·source ↗

ReuseRL: Skill Reuse as Compression in Agentic RL via MDL Principle

ReuseRL formalizes agentic reinforcement learning through the Minimum Description Length (MDL) principle, extracting a shared skill dictionary from successful trajectories and augmenting the RL objective with a segmentation cost that penalizes idiosyncratic, non-reusable behaviors. The authors prove a PAC-Bayes generalization bound for this compression penalty. Evaluated on ALFWorld, TextWorld-Cooking, and Countdown-Stepwise, ReuseRL outperforms vanilla GRPO and round-length baselines on both in-distribution and out-of-distribution tasks.