Almanac
benchmark

ClinHallu

benchmarkactiveprovisionalclinhallu-cad4fe2d·1 events·first seen 2d ago

Aliases: ClinHallu

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·2d ago·source ↗

ClinHallu benchmark diagnoses stage-wise hallucinations in medical multimodal LLM reasoning

Researchers from Alibaba DAMO Academy introduce ClinHallu, a benchmark of 7,031 validated instances designed to identify where hallucinations originate within medical MLLM reasoning pipelines. Each instance is annotated with a structured reasoning trace decomposed into Visual Recognition, Knowledge Recall, and Reasoning Integration stages, with stage-replacement interventions to measure the causal impact of correcting each stage. The paper also demonstrates that trace-supervised fine-tuning reduces stage-wise hallucinations, offering both diagnostic and mitigation value for clinical AI systems.