Almanac
paper

Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness

paperactiveprovisionaldecomposing-factual-sycophancy-in-language-models-how-size-and-instruction-tuning-shape-robustness-ab2a15b6·1 events·first seen 11d ago

Aliases: Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness

More like this (12)

Recent events (1)

6arXiv · cs.CL·11d ago·source ↗

Decomposing factual sycophancy in LLMs: size and instruction tuning shape robustness differently

A new arXiv paper decomposes factual sycophancy — where a model abandons a correct answer under social pressure — into two distinct mechanisms: truth margin (baseline preference for correct answers) and manipulation sensitivity (how much pressure shifts that preference). Evaluating 56 open-weight models from 0.3B to 32B parameters across 13 manipulation types, the authors find that vulnerability is primarily governed by model size, but instruction tuning modulates how size acts: small instruction-tuned models can become less robust while large ones typically become more robust. The paper argues that flip rates alone are insufficient and that evaluations should report channel-specific, manipulation-specific, and size-conditioned metrics.