benchmark
CoT-Control
benchmarkactive
cot-control-59db0b74·1 events·first seen 28d agoAliases: CoT-Control
Co-occurring entities
More like this (12)
Recent events (1)
Reasoning models struggle to control their chains of thought, and that's good
OpenAI introduces CoT-Control, a framework for evaluating how well reasoning models can deliberately manipulate or suppress their chain-of-thought outputs. The finding that models struggle to control their CoT is framed as a positive safety property, reinforcing the argument that visible reasoning traces serve as a meaningful monitorability safeguard. This contributes to ongoing research on whether chain-of-thought transparency is a reliable alignment and oversight tool.