benchmark
ClinHallu
benchmarkactiveprovisional
clinhallu-cad4fe2d·1 events·first seen 2d agoAliases: ClinHallu
Co-occurring entities
More like this (12)
Recent events (1)
ClinHallu benchmark diagnoses stage-wise hallucinations in medical multimodal LLM reasoning
Researchers from Alibaba DAMO Academy introduce ClinHallu, a benchmark of 7,031 validated instances designed to identify where hallucinations originate within medical MLLM reasoning pipelines. Each instance is annotated with a structured reasoning trace decomposed into Visual Recognition, Knowledge Recall, and Reasoning Integration stages, with stage-replacement interventions to measure the causal impact of correcting each stage. The paper also demonstrates that trace-supervised fine-tuning reduces stage-wise hallucinations, offering both diagnostic and mitigation value for clinical AI systems.