Almanac
paper

Learning Process Rewards via Success Visitation Matching for Efficient RL

paperactiveprovisionallearning-process-rewards-via-success-visitation-matching-for-efficient-rl-7e6e1135·1 events·first seen 43h ago

Aliases: Learning Process Rewards via Success Visitation Matching for Efficient RL

More like this (12)

Recent events (1)

5arXiv · cs.AI·43h ago·source ↗

Success Visitation Matching transforms sparse RL rewards into dense process rewards

A new arXiv paper proposes a method to convert sparse outcome rewards into dense process rewards by training a discriminator to distinguish successful from unsuccessful episodes and using it to guide policy learning toward the state-action visitations of successful trajectories. The approach is proven to preserve the optimal policy while providing denser feedback on task progress. Experiments focus on robotic manipulation finetuning in both simulated and real-world settings, showing faster RL convergence than sparse-reward baselines.