Almanac
technique

monitorability

techniqueactivemonitorability-21db5792·1 events·first seen 28d ago

Aliases: monitorability

Co-occurring entities

More like this (12)

Recent events (1)

7Openai Blog·28d ago·source ↗

Reasoning models struggle to control their chains of thought, and that's good

OpenAI introduces CoT-Control, a framework for evaluating how well reasoning models can deliberately manipulate or suppress their chain-of-thought outputs. The finding that models struggle to control their CoT is framed as a positive safety property, reinforcing the argument that visible reasoning traces serve as a meaningful monitorability safeguard. This contributes to ongoing research on whether chain-of-thought transparency is a reliable alignment and oversight tool.