Entity · benchmark

Countdown-Stepwise

benchmarkactivecountdown-stepwise-50486b5f·1 events·first seen Jun 1, 2026

Aliases: Countdown-Stepwise

Co-occurring entities

Minimum Description Length ALFWorld PAC-Bayes GRPO ReuseRL TextWorld-Cooking

More like this (12)

progressive decay schedule STEP DABStep path-wise rewinding anytime-valid sequential testing Multiview Counting progress advantage Conservative Drifting Method blockwise decoding Adaptive Multi-Step Lookahead Decoding for Diffusion Language Models Step-Audio R1.1 Realtime Forward-Forward Algorithm

Recent events (1)

6arXiv · cs.AI·Jun 1, 2026·source ↗

ReuseRL: Skill Reuse as Compression in Agentic RL via MDL Principle

ReuseRL formalizes agentic reinforcement learning through the Minimum Description Length (MDL) principle, extracting a shared skill dictionary from successful trajectories and augmenting the RL objective with a segmentation cost that penalizes idiosyncratic, non-reusable behaviors. The authors prove a PAC-Bayes generalization bound for this compression penalty. Evaluated on ALFWorld, TextWorld-Cooking, and Countdown-Stepwise, ReuseRL outperforms vanilla GRPO and round-length baselines on both in-distribution and out-of-distribution tasks.

Evaluation and Benchmarking Agent and Tool Ecosystem Minimum Description Length ALFWorld Countdown-Stepwise +5 more