Almanac
technique

outcome supervision

techniqueactiveoutcome-supervision-6b0719ad·1 events·first seen 28d ago

Aliases: outcome supervision

Co-occurring entities

More like this (12)

Recent events (1)

7Openai Blog·28d ago·source ↗

Improving Mathematical Reasoning with Process Supervision

OpenAI trained a model achieving state-of-the-art mathematical problem solving by rewarding each correct reasoning step (process supervision) rather than only the final answer (outcome supervision). This approach improves performance on math benchmarks and carries an alignment benefit by training models to produce human-endorsed chain-of-thought reasoning. The work highlights a potential synergy between capability improvements and alignment techniques.