ALFWorld
alfworld-c4e69dec·2 events·first seen 16d agoAliases: ALFWorld
Co-occurring entities
More like this (12)
Recent events (2)
ReuseRL: Skill Reuse as Compression in Agentic RL via MDL Principle
ReuseRL formalizes agentic reinforcement learning through the Minimum Description Length (MDL) principle, extracting a shared skill dictionary from successful trajectories and augmenting the RL objective with a segmentation cost that penalizes idiosyncratic, non-reusable behaviors. The authors prove a PAC-Bayes generalization bound for this compression penalty. Evaluated on ALFWorld, TextWorld-Cooking, and Countdown-Stepwise, ReuseRL outperforms vanilla GRPO and round-length baselines on both in-distribution and out-of-distribution tasks.
RePro: Retrospective Progress-Aware Self-Refinement for LLM Agent Training
Researchers introduce RePro (Retrospective Progress-Aware Training), a framework addressing the gap between step-wise RL optimization and metacognitive task-progress awareness in LLM agents. The approach uses a forward-then-reflect rollout paradigm where agents execute actions online and then retrospectively assess step-wise progress given the completed trajectory and known outcome. Evaluated on WebShop, ALFWorld, and Sokoban, RePro achieves up to 12% absolute success rate gains over baseline Qwen-family models without requiring continuous external supervision.