Almanac
paper

Measuring Epistemic Resilience of LLMs Under Misleading Medical Context

paperactiveprovisionalmeasuring-epistemic-resilience-of-llms-under-misleading-medical-context-d10c8be9·1 events·first seen 6d ago

Aliases: Measuring Epistemic Resilience of LLMs Under Misleading Medical Context

Co-occurring entities

More like this (12)

Recent events (1)

7arXiv · cs.CL·6d ago·source ↗

MedMisBench: LLMs show fragile epistemic resilience under misleading medical context

Researchers introduce MedMisBench, a benchmark of 10,932 medical questions paired with 48,889 misleading context injections, to measure whether LLMs maintain correct medical judgment under adversarial pressure. Across 11 model configurations, mean accuracy drops from 71.1% to 38.0% when misleading context is injected, with authority-framed falsehoods achieving 69.5% attack success. A 14-member international clinical panel flagged serious potential harm in 38.2% of reviewed cases. The work argues that existing medical benchmarks measure knowledge but not robustness to manipulation, exposing a structural gap in LLM safety evaluation for healthcare.