technique
UBP2
techniqueactiveprovisional
ubp2-202f148b·1 events·first seen 2d agoAliases: UBP2
Co-occurring entities
More like this (12)
Recent events (1)
UBP2: Model-based preference RL with uncertainty-balanced exploration achieves sublinear regret
UBP2 (Uncertainty-Balanced Preference Planning) is a model-based reinforcement learning method that improves sample efficiency in preference-based RL by jointly reasoning over uncertainties in reward, dynamics, and value functions. The approach uses ensembles to score candidate trajectories and provides a principled exploitation-exploration tradeoff without ad hoc heuristics. The authors prove sublinear regret guarantees for finite- and infinite-horizon settings and demonstrate substantially better sample efficiency than model-free baselines on the Meta-World benchmark.