Entity · paper

Scaling Laws for Reward Model Overoptimization

paperactivescaling-laws-for-reward-model-overoptimization-fa025a75·1 events·first seen May 20, 2026

Aliases: Scaling Laws for Reward Model Overoptimization

Co-occurring entities

KL Divergence Goodhart's Law Reinforcement Learning from Human Feedback OpenAI

More like this (12)

Scaling Laws for Neural Language Models Gradient-Guided Reward Optimization Improving LLM-Generated Process Model Quality Through Reinforcement Learning: The Role of Reward Function Design AI scaling laws RREDCoT: Segment-Level Reward Redistribution for Reasoning Models Process Reward Model reward model power-law scaling What do Reward Models Memorize?Rule-Based Rewards Using Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning Responsible Scaling Policy

Recent events (1)

7Openai Blog·May 20, 2026·source ↗

Scaling Laws for Reward Model Overoptimization

OpenAI published research investigating how reward model overoptimization scales with policy and reward model size in RLHF pipelines. The work characterizes the relationship between KL divergence from the initial policy and gold-standard reward, finding predictable degradation patterns as optimization pressure increases. This provides empirical grounding for understanding Goodhart's Law dynamics in language model fine-tuning and has implications for designing safer, more robust RLHF training regimes.

Evaluation and Benchmarking AI Safety Research KL Divergence Goodhart's Law Scaling Laws for Reward Model Overoptimization +3 more