IFEval
ifeval-493b4c08·2 events·first seen 15d agoAliases: IFEval
Co-occurring entities
More like this (12)
Recent events (2)
Study finds thinking mode in LRMs shifts instruction-following errors by constraint type rather than uniformly degrading performance
A new arXiv paper investigates how enabling built-in chain-of-thought reasoning ('Thinking ON/OFF') in Qwen3 and Hunyuan models affects instruction following on IFEval. Aggregate pass-rate changes are small but 10-20% of prompts switch outcomes, with 'Planning' constraints (global counting, structure) improving under thinking while 'Precision' constraints (exact local form) consistently worsen. Activation patching and trace-relevance analyses reveal an execution gap: thinking traces engage with Planning constraints but fail to translate that engagement into compliance, while Precision failures are more mechanistically recoverable. The findings have practical implications for when to enable reasoning modes in instruction-following applications.
Activation Capping Technique Stabilizes LLM Assistant Personas Against Drift and Jailbreaks
Researchers from MATS, Oxford, and Anthropic introduced the 'assistant axis,' a vector derived from LLM layer outputs that quantifies how closely a model adheres to its trained assistant persona. They developed 'activation capping,' an inference-time method that corrects deviations from this axis when similarity falls below a threshold. Testing on Gemma 2 27B, Qwen3 32B, and Llama 3.3 70B showed harmful response rates to jailbreak prompts dropped by roughly half (e.g., 83% to 41% for Qwen3 32B) without degrading benchmark performance. The technique targets character-based jailbreaks that bypass system prompts by manipulating a model's internal representational state.