Self-Preference Is Weak or Absent in Verifiable Instruction-Following Revision: A Four-Model Test Under Genuine Authorship
self-preference-is-weak-or-absent-in-verifiable-instruction-following-revision-a-four-model-test-under-genuine-authorship-242471f1·1 events·first seen 47h agoAliases: Self-Preference Is Weak or Absent in Verifiable Instruction-Following Revision: A Four-Model Test Under Genuine Authorship
Co-occurring entities
More like this (12)
Recent events (1)
Study finds no detectable self-preference bias when LLMs revise their own instruction-following drafts
A new arXiv preprint tests whether LLMs resist valid corrections to their own writing by using IFEval's deterministic verifier to establish ground-truth correctness, bypassing model-as-judge subjectivity. Across four mid-tier model families and 85 author-versus-fresh comparisons, no statistically significant self-preference bias was detected (gap -5.1 pp, 95% CI [-12.9, +2.7]). A qualitative finding shows that when authors do reject verified-good fixes, 97% of stated reasons are substantive flaw-catching rather than preference. The result challenges the assumption that documented self-preference in judging tasks extends to self-revision contexts.