Almanac
other

Goodhart's Law

otheractivegoodhart-s-law-105774e0·2 events·first seen 28d ago

Aliases: Goodhart's Law

Co-occurring entities

More like this (12)

Recent events (2)

5Openai Blog·28d ago·source ↗

Measuring Goodhart's Law

OpenAI published a blog post examining Goodhart's Law in the context of AI training, where optimizing a proxy objective can cause it to diverge from the true underlying goal. The post addresses the challenge of measuring and optimizing objectives that are difficult or costly to evaluate directly. This is directly relevant to reward hacking, specification gaming, and alignment research at OpenAI.

7Openai Blog·28d ago·source ↗

Scaling Laws for Reward Model Overoptimization

OpenAI published research investigating how reward model overoptimization scales with policy and reward model size in RLHF pipelines. The work characterizes the relationship between KL divergence from the initial policy and gold-standard reward, finding predictable degradation patterns as optimization pressure increases. This provides empirical grounding for understanding Goodhart's Law dynamics in language model fine-tuning and has implications for designing safer, more robust RLHF training regimes.