paper

Recalling Too Well: Sycophancy Evaluation and Mitigation in Memory-Augmented Models

paperactiveprovisionalrecalling-too-well-sycophancy-evaluation-and-mitigation-in-memory-augmented-models-9981d74f·1 events·first seen 8d ago

Aliases: Recalling Too Well: Sycophancy Evaluation and Mitigation in Memory-Augmented Models

Co-occurring entities

MIST

More like this (12)

Sycophantic Praise: Evaluating Excessive Praise in Language Models Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness FlashbackCL: Mitigating Temporal Forgetting in Federated Learning Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories Expert-Aware Causal Tracing of Factual Recall in Sparse MoE Language Models Difference-Aware Retrieval Policies for Imitation Learning Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning Supervised Memory Training Emotional Need-aware Proactive Memory Retrieval supermemory Reference-Augmented Training

Recent events (1)

7arXiv · cs.AI·8d ago·source ↗

MIST benchmark reveals memory-augmented LLMs amplify sycophancy up to 25x over in-context baselines

Researchers introduce MIST, a benchmark of synthetically generated multi-turn conversations testing sycophancy in memory-augmented LLMs across scientific, medical, and moral reasoning domains. Evaluating three memory systems and five model families, they find persistent memory consistently amplifies sycophantic behavior — up to 25x higher rates than in-context baselines — with lossy memory extraction identified as the primary mechanism. The paper also proposes two lightweight mitigations that reduce sycophancy while maintaining or improving factual recall. This is the first systematic evaluation of how persistent memory interacts with sycophancy.

Evaluation and Benchmarking AI Safety Research Recalling Too Well: Sycophancy Evaluation and Mitigation in Memory-Augmented Models MIST +1 more