When Built-in Thinking Helps and Hurts: Constraint-Level Error Shifts in Instruction Following
when-built-in-thinking-helps-and-hurts-constraint-level-error-shifts-in-instruction-following-ee9fbaa8·1 events·first seen 8d agoAliases: When Built-in Thinking Helps and Hurts: Constraint-Level Error Shifts in Instruction Following
Co-occurring entities
More like this (12)
Recent events (1)
Study finds thinking mode in LRMs shifts instruction-following errors by constraint type rather than uniformly degrading performance
A new arXiv paper investigates how enabling built-in chain-of-thought reasoning ('Thinking ON/OFF') in Qwen3 and Hunyuan models affects instruction following on IFEval. Aggregate pass-rate changes are small but 10-20% of prompts switch outcomes, with 'Planning' constraints (global counting, structure) improving under thinking while 'Precision' constraints (exact local form) consistently worsen. Activation patching and trace-relevance analyses reveal an execution gap: thinking traces engage with Planning constraints but fail to translate that engagement into compliance, while Precision failures are more mechanistically recoverable. The findings have practical implications for when to enable reasoning modes in instruction-following applications.