Progress Advantage for LLM Agents
progress-advantage-for-llm-agents-b830ea2a·1 events·first seen 5d agoAliases: Progress Advantage for LLM Agents
Co-occurring entities
More like this (12)
Recent events (1)
Progress Advantage: Annotation-Free Step-Level Scoring for LLM Agents via RL Post-Training
Researchers introduce 'progress advantage,' a method that derives implicit step-level reward signals for LLM agents directly from the log-probability ratio between an RL-trained policy and its reference policy, without requiring dedicated process reward model training. The approach is shown to recover the optimal advantage function under a general stochastic MDP formulation, making it annotation-free and domain-agnostic. Validated across five benchmarks and four model families on tasks including test-time scaling, uncertainty quantification, and failure attribution, it outperforms confidence-based baselines and even dedicated trained reward models. The result is practically significant because building process reward models for agentic settings is currently a major bottleneck.