paper

Learning Process Rewards via Success Visitation Matching for Efficient RL

paperactiveprovisionallearning-process-rewards-via-success-visitation-matching-for-efficient-rl-7e6e1135·1 events·first seen 43h ago

Aliases: Learning Process Rewards via Success Visitation Matching for Efficient RL

More like this (12)

ExpRL: Exploratory RL for LLM Mid-Training Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes InSight: Self-Guided Skill Acquisition via Steerable VLAs InSight: Self-Guided Skill Acquisition via Steerable VLAs Reinforcement Learning with Verifiable Rewards Gradient-Guided Reward Optimization KL-regularized RL reinforcement learning from verifier feedback Reward Learning from Comparisons Reinforcement Learning for Code Reinforcement Learning from Rich Feedback with Distributional DAgger RLIF (Reinforcement Learning from Internal Feedback)

Recent events (1)

5arXiv · cs.AI·43h ago·source ↗

Success Visitation Matching transforms sparse RL rewards into dense process rewards

A new arXiv paper proposes a method to convert sparse outcome rewards into dense process rewards by training a discriminator to distinguish successful from unsuccessful episodes and using it to guide policy learning toward the state-action visitations of successful trajectories. The approach is proven to preserve the optimal policy while providing denser feedback on task progress. Experiments focus on robotic manipulation finetuning in both simulated and real-world settings, showing faster RL convergence than sparse-reward baselines.

Agent and Tool Ecosystem Alignment and RLHF Learning Process Rewards via Success Visitation Matching for Efficient RL