benchmark

SHOVIR

benchmarkactiveprovisionalshovir-0cb6575c·1 events·first seen 14h ago

Aliases: SHOVIR

Co-occurring entities

CheXpert Plus MIMIC-CXR PadChest-GR

More like this (12)

COVAN ScIRGen MVOIK-4D SAIR TEVI P4IR SHERLOC SV-Detect HTV-Agent HiViG MMVU AGENTSERVESIM

Recent events (1)

5arXiv · cs.CL·14h ago·source ↗

SHOVIR benchmark exposes vision shortcut learning failures in radiology report generation VLMs

Researchers introduce SHOVIR, a benchmark for detecting 'vision shortcut' behavior in Vision-Language Models applied to Radiology Report Generation (RRG), where models achieve high scores by exploiting learned priors rather than actual image evidence. The benchmark extends MIMIC-CXR and PadChest-GR with per-box CheXpert labels and uses localized occlusion experiments to isolate two failure modes: direct shortcuts (findings persist after visual evidence is removed) and contextual shortcuts (detection degrades when co-occurring pathologies are occluded). Evaluating eight state-of-the-art VLMs, the authors find that high report quality does not correlate with strong spatial grounding, revealing a systematic blind spot in current RRG evaluation protocols.

Evaluation and Benchmarking Multimodal Progress CheXpert Plus MIMIC-CXR PadChest-GR +1 more