Almanac
paper

Scaling Laws for Reward Model Overoptimization

paperactivescaling-laws-for-reward-model-overoptimization-fa025a75·1 events·first seen 28d ago

Aliases: Scaling Laws for Reward Model Overoptimization

Co-occurring entities

More like this (12)

Recent events (1)

7Openai Blog·28d ago·source ↗

Scaling Laws for Reward Model Overoptimization

OpenAI published research investigating how reward model overoptimization scales with policy and reward model size in RLHF pipelines. The work characterizes the relationship between KL divergence from the initial policy and gold-standard reward, finding predictable degradation patterns as optimization pressure increases. This provides empirical grounding for understanding Goodhart's Law dynamics in language model fine-tuning and has implications for designing safer, more robust RLHF training regimes.