paper

Progress Advantage for LLM Agents

paperactiveprovisionalprogress-advantage-for-llm-agents-b830ea2a·1 events·first seen 5d ago

Aliases: Progress Advantage for LLM Agents

Co-occurring entities

More like this (12)

Multi-Component LLM Agent LLM Agent Classroom LLM agents Legal Agent Benchmark LLM Bargaining Agents progress advantage Always-OnAgents: A Survey of Persistent Memory, State, and Governance in LLM Agents Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution Benchmark Agent frontier LLMs MobileLLM-Pro Agent-Reach

Recent events (1)

7arXiv · cs.AI·5d ago·source ↗

Progress Advantage: Annotation-Free Step-Level Scoring for LLM Agents via RL Post-Training

Researchers introduce 'progress advantage,' a method that derives implicit step-level reward signals for LLM agents directly from the log-probability ratio between an RL-trained policy and its reference policy, without requiring dedicated process reward model training. The approach is shown to recover the optimal advantage function under a general stochastic MDP formulation, making it annotation-free and domain-agnostic. Validated across five benchmarks and four model families on tasks including test-time scaling, uncertainty quantification, and failure attribution, it outperforms confidence-based baselines and even dedicated trained reward models. The result is practically significant because building process reward models for agentic settings is currently a major bottleneck.

Evaluation and Benchmarking Agent and Tool Ecosystem progress advantage Progress Advantage for LLM Agents +1 more