Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness
decomposing-factual-sycophancy-in-language-models-how-size-and-instruction-tuning-shape-robustness-ab2a15b6·1 events·first seen 11d agoAliases: Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness
More like this (12)
Recent events (1)
Decomposing factual sycophancy in LLMs: size and instruction tuning shape robustness differently
A new arXiv paper decomposes factual sycophancy — where a model abandons a correct answer under social pressure — into two distinct mechanisms: truth margin (baseline preference for correct answers) and manipulation sensitivity (how much pressure shifts that preference). Evaluating 56 open-weight models from 0.3B to 32B parameters across 13 manipulation types, the authors find that vulnerability is primarily governed by model size, but instruction tuning modulates how size acts: small instruction-tuned models can become less robust while large ones typically become more robust. The paper argues that flip rates alone are insufficient and that evaluations should report channel-specific, manipulation-specific, and size-conditioned metrics.