paper
Learning Process Rewards via Success Visitation Matching for Efficient RL
paperactiveprovisional
learning-process-rewards-via-success-visitation-matching-for-efficient-rl-7e6e1135·1 events·first seen 43h agoAliases: Learning Process Rewards via Success Visitation Matching for Efficient RL
More like this (12)
ExpRL: Exploratory RL for LLM Mid-TrainingHierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode OutcomesInSight: Self-Guided Skill Acquisition via Steerable VLAsInSight: Self-Guided Skill Acquisition via Steerable VLAsReinforcement Learning with Verifiable RewardsGradient-Guided Reward OptimizationKL-regularized RLreinforcement learning from verifier feedbackReward Learning from ComparisonsReinforcement Learning for CodeReinforcement Learning from Rich Feedback with Distributional DAggerRLIF (Reinforcement Learning from Internal Feedback)
Recent events (1)
Success Visitation Matching transforms sparse RL rewards into dense process rewards
A new arXiv paper proposes a method to convert sparse outcome rewards into dense process rewards by training a discriminator to distinguish successful from unsuccessful episodes and using it to guide policy learning toward the state-action visitations of successful trajectories. The approach is proven to preserve the optimal policy while providing denser feedback on task progress. Experiments focus on robotic manipulation finetuning in both simulated and real-world settings, showing faster RL convergence than sparse-reward baselines.