Almanac
benchmark

Sokoban

benchmarkactiveprovisionalsokoban-5ddd673a·1 events·first seen 2d ago

Aliases: Sokoban

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·2d ago·source ↗

RePro: Retrospective Progress-Aware Self-Refinement for LLM Agent Training

Researchers introduce RePro (Retrospective Progress-Aware Training), a framework addressing the gap between step-wise RL optimization and metacognitive task-progress awareness in LLM agents. The approach uses a forward-then-reflect rollout paradigm where agents execute actions online and then retrospectively assess step-wise progress given the completed trajectory and known outcome. Evaluated on WebShop, ALFWorld, and Sokoban, RePro achieves up to 12% absolute success rate gains over baseline Qwen-family models without requiring continuous external supervision.