Entity · paper

Measuring Epistemic Resilience of LLMs Under Misleading Medical Context

paperactivemeasuring-epistemic-resilience-of-llms-under-misleading-medical-context-d10c8be9·1 events·first seen Jun 11, 2026

Aliases: Measuring Epistemic Resilience of LLMs Under Misleading Medical Context

Co-occurring entities

MedMisBench

More like this (12)

Opaque Epistemic Mediation: How LLM Deployment Configurations Shape the Validation of Pseudo-Science Reassessing High-Performing LLMs on Polish Medical Exams: True Competence or Bias-Driven Performance?Clinically Grounded Privacy Evaluation of Medical LMs Trade-offs in Medical LLM Adaptation: An Empirical Study in French QA Clinician-Level Agreement Without Clinical Caution: LLM Evaluator Limits in Medical AI Benchmarking Moral Safety in LLMs: Exposing Performative Compliance with Puzzled Cues Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs Metacognition in LLMs: Foundations, Progress, and Opportunities Metacognition in LLMs: Foundations, Progress, and Opportunities Surrogate Fidelity: When Can Open LLMs Explain Closed Ones?Estimating Uncertainty from Reasoning: A Large-Scale Study of Multi- and Crosslingual MCQA Performance in LLMs Can LLMs Reliably Self-Report Adversarial Prefills, and How?

Recent events (1)

7arXiv · cs.CL·Jun 11, 2026·source ↗

MedMisBench: LLMs show fragile epistemic resilience under misleading medical context

Researchers introduce MedMisBench, a benchmark of 10,932 medical questions paired with 48,889 misleading context injections, to measure whether LLMs maintain correct medical judgment under adversarial pressure. Across 11 model configurations, mean accuracy drops from 71.1% to 38.0% when misleading context is injected, with authority-framed falsehoods achieving 69.5% attack success. A 14-member international clinical panel flagged serious potential harm in 38.2% of reviewed cases. The work argues that existing medical benchmarks measure knowledge but not robustness to manipulation, exposing a structural gap in LLM safety evaluation for healthcare.

Evaluation and Benchmarking AI Safety Research Measuring Epistemic Resilience of LLMs Under Misleading Medical Context MedMisBench