technique
Reward Learning from Comparisons
techniqueactive
reward-learning-from-comparisons-77fbacf3·1 events·first seen 28d agoAliases: Reward Learning from Comparisons
Co-occurring entities
More like this (12)
Reinforcement Learning from Human Feedbackcontrastive learningRule-Based RewardsImitation LearningIn-Context Reward Adaptationreward modelrule-based reinforcement learning rewardsrubric-based reward shapingProcess Reward ModelLearning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-TuningQ-learningGoal-Conditioned Reinforcement Learning
Recent events (1)
Learning from Human Preferences: OpenAI and DeepMind Collaborate on Reward Learning from Comparisons
OpenAI, in collaboration with DeepMind's safety team, published a method for learning reward functions directly from human preference comparisons between pairs of agent behaviors, eliminating the need to hand-code goal functions. The algorithm infers human intent by asking evaluators which of two proposed behaviors is preferable, addressing risks from misspecified reward functions. This work is an early foundational contribution to what would become reinforcement learning from human feedback (RLHF). It targets both safety and alignment concerns around reward hacking and proxy gaming.