benchmark
Helpfulness Consistency
benchmarkactive
helpfulness-consistency-bc3f86ba·1 events·first seen 26d agoAliases: Helpfulness Consistency
Co-occurring entities
More like this (12)
Sentiment Consistencyconsistency trainingConsistency Training Can Entrench MisalignmentSocial Gaze ConsistencyChain-of-Thought Self-Consistencyoperadic consistencyLatent Consistency ModelsDynamic-Probabilistic Consistency GapHHH (Helpful, Harmless, Honest)Knowledge AssistPolitical Consistency Training (PCT)Self-Consistency / Majority Voting
Recent events (1)
Political Consistency Training: Reducing Covert Political Bias in LLMs via RL
Researchers identify a phenomenon called 'covert political bias' in LLMs, where models handle politically paired topics asymmetrically across 7 identified technique categories. They propose two metrics—Sentiment Consistency and Helpfulness Consistency—to measure this asymmetry. To address it, they introduce Political Consistency Training (PCT), an RL-based method with complementary training paradigms that reduces covert bias while preserving overall helpfulness and generalizing to held-out benchmarks.