paper

Sycophantic Praise: Evaluating Excessive Praise in Language Models

paperactiveprovisionalsycophantic-praise-evaluating-excessive-praise-in-language-models-627ee2c5·1 events·first seen 9d ago

Aliases: Sycophantic Praise: Evaluating Excessive Praise in Language Models

More like this (12)

Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness Recalling Too Well: Sycophancy Evaluation and Mitigation in Memory-Augmented Models sycophancy The Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of Large Language Models Scaling Laws for Reward Model Overoptimization Automated reproducibility assessments in the social and behavioral sciences using large language models Speaker Group Encoding in Self-supervised Speech Recognition Models Unintended Effects of Geographic Conditioning in Large Language Models The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application Reasoning Language Models Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting

Recent events (1)

5arXiv · cs.CL·9d ago·source ↗

Parameterized framework for measuring sycophantic praise in language models

A new arXiv paper argues that sycophantic praise and flattery constitute a distinct alignment problem separate from the more commonly studied excessive agreement. The authors introduce a parameterized framework that measures whether praise is excessive relative to contribution quality and expected user ability, outperforming generic LLM judges on human annotation agreement. Key finding: sycophantic praise occurs far more frequently in social and interpretive domains than in objective reasoning settings, positioning praise calibration as a distinct alignment challenge.

Evaluation and Benchmarking Alignment and RLHF Sycophantic Praise: Evaluating Excessive Praise in Language Models