Almanac
paper

To Compare, or Not to Compare: On Methodological Practices in Evaluating Social Bias

paperactiveprovisionalto-compare-or-not-to-compare-on-methodological-practices-in-evaluating-social-bias-82dbd849·1 events·first seen 20h ago

Aliases: To Compare, or Not to Compare: On Methodological Practices in Evaluating Social Bias

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·20h ago·source ↗

Unified framework reveals systematic bias amplification in comparative LLM evaluation settings

A new arXiv paper introduces a unified framework for standardizing social bias benchmarks across isolated and forced-choice comparative evaluation settings. The study finds a large 'paradigm gap': comparative settings act as aggressive catalysts for latent discrimination compared to isolated assessments, and Chain-of-Thought reasoning exacerbates this effect rather than mitigating it. Critically, this comparative bias persists even when models are given neutral fallback options or claim to answer randomly, and scales positively with model size. The authors recommend comparative settings for auditing but warn practitioners against using comparative deployments in ambiguous real-world tasks.