paper
Scaling Laws for Reward Model Overoptimization
paperactive
scaling-laws-for-reward-model-overoptimization-fa025a75·1 events·first seen 28d agoAliases: Scaling Laws for Reward Model Overoptimization
Co-occurring entities
More like this (12)
Scaling Laws for Neural Language ModelsGradient-Guided Reward OptimizationAI scaling lawsRREDCoT: Segment-Level Reward Redistribution for Reasoning ModelsProcess Reward Modelreward modelpower-law scalingRule-Based RewardsUsing Reward Uncertainty to Induce Diverse Behaviour in Reinforcement LearningResponsible Scaling PolicyAnthropic Responsible Scaling PolicyIn-Context Reward Adaptation
Recent events (1)
Scaling Laws for Reward Model Overoptimization
OpenAI published research investigating how reward model overoptimization scales with policy and reward model size in RLHF pipelines. The work characterizes the relationship between KL divergence from the initial policy and gold-standard reward, finding predictable degradation patterns as optimization pressure increases. This provides empirical grounding for understanding Goodhart's Law dynamics in language model fine-tuning and has implications for designing safer, more robust RLHF training regimes.