Entity · technique

Reward Learning from Comparisons

techniqueactivereward-learning-from-comparisons-77fbacf3·1 events·first seen May 20, 2026

Aliases: Reward Learning from Comparisons

Co-occurring entities

DeepMind Reinforcement Learning from Human Feedback OpenAI

More like this (12)

Reinforcement Learning from Human Feedback contrastive learning Rule-Based Rewards Imitation Learning What do Reward Models Memorize?In-Context Reward Adaptation reward model rule-based reinforcement learning rewards Improving LLM-Generated Process Model Quality Through Reinforcement Learning: The Role of Reward Function Design rubric-based reward shaping Process Reward Model Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

Recent events (1)

7Openai Blog·May 20, 2026·source ↗

Learning from Human Preferences: OpenAI and DeepMind Collaborate on Reward Learning from Comparisons

OpenAI, in collaboration with DeepMind's safety team, published a method for learning reward functions directly from human preference comparisons between pairs of agent behaviors, eliminating the need to hand-code goal functions. The algorithm infers human intent by asking evaluators which of two proposed behaviors is preferable, addressing risks from misspecified reward functions. This work is an early foundational contribution to what would become reinforcement learning from human feedback (RLHF). It targets both safety and alignment concerns around reward hacking and proxy gaming.

Evaluation and Benchmarking AI Safety Research Reward Learning from Comparisons DeepMind Reinforcement Learning from Human Feedback +2 more