paper
Sycophantic Praise: Evaluating Excessive Praise in Language Models
paperactiveprovisional
sycophantic-praise-evaluating-excessive-praise-in-language-models-627ee2c5·1 events·first seen 9d agoAliases: Sycophantic Praise: Evaluating Excessive Praise in Language Models
More like this (12)
Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape RobustnessRecalling Too Well: Sycophancy Evaluation and Mitigation in Memory-Augmented ModelssycophancyThe Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of Large Language ModelsScaling Laws for Reward Model OveroptimizationAutomated reproducibility assessments in the social and behavioral sciences using large language modelsSpeaker Group Encoding in Self-supervised Speech Recognition ModelsUnintended Effects of Geographic Conditioning in Large Language ModelsThe Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language ModelAgentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and ApplicationReasoning Language ModelsEvaluation Cards: An Interpretive Layer for AI Evaluation Reporting
Recent events (1)
Parameterized framework for measuring sycophantic praise in language models
A new arXiv paper argues that sycophantic praise and flattery constitute a distinct alignment problem separate from the more commonly studied excessive agreement. The authors introduce a parameterized framework that measures whether praise is excessive relative to contribution quality and expected user ability, outperforming generic LLM judges on human annotation agreement. Key finding: sycophantic praise occurs far more frequently in social and interpretive domains than in objective reasoning settings, positioning praise calibration as a distinct alignment challenge.