Almanac
technique

Reward Learning from Comparisons

techniqueactivereward-learning-from-comparisons-77fbacf3·1 events·first seen 28d ago

Aliases: Reward Learning from Comparisons

Co-occurring entities

More like this (12)

Recent events (1)

7Openai Blog·28d ago·source ↗

Learning from Human Preferences: OpenAI and DeepMind Collaborate on Reward Learning from Comparisons

OpenAI, in collaboration with DeepMind's safety team, published a method for learning reward functions directly from human preference comparisons between pairs of agent behaviors, eliminating the need to hand-code goal functions. The algorithm infers human intent by asking evaluators which of two proposed behaviors is preferable, addressing risks from misspecified reward functions. This work is an early foundational contribution to what would become reinforcement learning from human feedback (RLHF). It targets both safety and alignment concerns around reward hacking and proxy gaming.