Entity · paper

Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness

paperactivedecomposing-factual-sycophancy-in-language-models-how-size-and-instruction-tuning-shape-robustness-ab2a15b6·1 events·first seen Jun 5, 2026

Aliases: Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness

More like this (12)

Sycophantic Praise: Evaluating Excessive Praise in Language Models Prompt Design at Scale: How Format, Instruction Count, and Context Length Shape Instruction Adherence and Hallucination in Large Language Models Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact Towards Mechanistically Understanding Why Memorized Knowledge Fails to Generalize in Large Language Model Finetuning Instruction-Tuned Models Locally Reuse Human Syntax More Than Humans Do Recalling Too Well: Sycophancy Evaluation and Mitigation in Memory-Augmented Models Language Model Finetuning Tapered Language Models Artificial Epanorthosis: Why large language models overuse a classical rhetorical figure, and how to mitigate it How Does Alignment Tuning Shape Representations of Sycophancy and Related Cue-Induced Biases in LLMs?Knowledgeless Language Models: Suppressing Parametric Recall for Evidence-Grounded Language Modeling Sound Probabilistic Safety Bounds for Large Language Models

Recent events (1)

6arXiv · cs.CL·Jun 5, 2026·source ↗

Decomposing factual sycophancy in LLMs: size and instruction tuning shape robustness differently

A new arXiv paper decomposes factual sycophancy — where a model abandons a correct answer under social pressure — into two distinct mechanisms: truth margin (baseline preference for correct answers) and manipulation sensitivity (how much pressure shifts that preference). Evaluating 56 open-weight models from 0.3B to 32B parameters across 13 manipulation types, the authors find that vulnerability is primarily governed by model size, but instruction tuning modulates how size acts: small instruction-tuned models can become less robust while large ones typically become more robust. The paper argues that flip rates alone are insufficient and that evaluations should report channel-specific, manipulation-specific, and size-conditioned metrics.

Evaluation and Benchmarking Open Weights Progress Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness +1 more