Entity · benchmark

IFEval

benchmarkactiveifeval-493b4c08·3 events·first seen Jun 1, 2026

Aliases: IFEval

Co-occurring entities

Self-Preference Is Weak or Absent in Verifiable Instruction-Following Revision: A Four-Model Test Under Genuine Authorship When Built-in Thinking Helps and Hurts: Constraint-Level Error Shifts in Instruction Following Hunyuan Alibaba Qwen3 Tencent Gemma 2 9B assistant axis Llama 3.1 70B EQ-Bench DeepSeek V4 ML Alignment & Theory Scholars Program MMLU-Pro activation capping Qwen3 32B University of Oxford Christina Lu GSM8K Anthropic

More like this (12)

L-Eval T-Eval ValueEval G-Eval ParaEval HumanEvalFIM STAGE-Eval HypoEval Arabic Instruction Following Eval (IFEval)Verilog-Eval Every Eval Ever SummEval

Recent events (3)

5arXiv · cs.CL·Jun 19, 2026·source ↗

Study finds no detectable self-preference bias when LLMs revise their own instruction-following drafts

A new arXiv preprint tests whether LLMs resist valid corrections to their own writing by using IFEval's deterministic verifier to establish ground-truth correctness, bypassing model-as-judge subjectivity. Across four mid-tier model families and 85 author-versus-fresh comparisons, no statistically significant self-preference bias was detected (gap -5.1 pp, 95% CI [-12.9, +2.7]). A qualitative finding shows that when authors do reject verified-good fixes, 97% of stated reasons are substantive flaw-catching rather than preference. The result challenges the assumption that documented self-preference in judging tasks extends to self-revision contexts.

Evaluation and Benchmarking Alignment and RLHF Self-Preference Is Weak or Absent in Verifiable Instruction-Following Revision: A Four-Model Test Under Genuine Authorship IFEval

5arXiv · cs.CL·Jun 9, 2026·source ↗

Study finds thinking mode in LRMs shifts instruction-following errors by constraint type rather than uniformly degrading performance

A new arXiv paper investigates how enabling built-in chain-of-thought reasoning ('Thinking ON/OFF') in Qwen3 and Hunyuan models affects instruction following on IFEval. Aggregate pass-rate changes are small but 10-20% of prompts switch outcomes, with 'Planning' constraints (global counting, structure) improving under thinking while 'Precision' constraints (exact local form) consistently worsen. Activation patching and trace-relevance analyses reveal an execution gap: thinking traces engage with Planning constraints but fail to translate that engagement into compliance, while Precision failures are more mechanistically recoverable. The findings have practical implications for when to enable reasoning modes in instruction-following applications.

Frontier Model Releases Evaluation and Benchmarking When Built-in Thinking Helps and Hurts: Constraint-Level Error Shifts in Instruction Following Hunyuan Alibaba +3 more

6The Batch·Jun 1, 2026·source ↗

Activation Capping Technique Stabilizes LLM Assistant Personas Against Drift and Jailbreaks

Researchers from MATS, Oxford, and Anthropic introduced the 'assistant axis,' a vector derived from LLM layer outputs that quantifies how closely a model adheres to its trained assistant persona. They developed 'activation capping,' an inference-time method that corrects deviations from this axis when similarity falls below a threshold. Testing on Gemma 2 27B, Qwen3 32B, and Llama 3.3 70B showed harmful response rates to jailbreak prompts dropped by roughly half (e.g., 83% to 41% for Qwen3 32B) without degrading benchmark performance. The technique targets character-based jailbreaks that bypass system prompts by manipulating a model's internal representational state.

Evaluation and Benchmarking AI Safety Research Gemma 2 9B assistant axis Llama 3.1 70B +12 more