To Compare, or Not to Compare: On Methodological Practices in Evaluating Social Bias
to-compare-or-not-to-compare-on-methodological-practices-in-evaluating-social-bias-82dbd849·1 events·first seen 20h agoAliases: To Compare, or Not to Compare: On Methodological Practices in Evaluating Social Bias
Co-occurring entities
More like this (12)
Recent events (1)
Unified framework reveals systematic bias amplification in comparative LLM evaluation settings
A new arXiv paper introduces a unified framework for standardizing social bias benchmarks across isolated and forced-choice comparative evaluation settings. The study finds a large 'paradigm gap': comparative settings act as aggressive catalysts for latent discrimination compared to isolated assessments, and Chain-of-Thought reasoning exacerbates this effect rather than mitigating it. Critically, this comparative bias persists even when models are given neutral fallback options or claim to answer randomly, and scales positively with model size. The authors recommend comparative settings for auditing but warn practitioners against using comparative deployments in ambiguous real-world tasks.