Almanac
benchmark

PaSBench-Video

benchmarkactiveprovisionalpasbench-video-2db153d8·1 events·first seen 15d ago

Aliases: PaSBench-Video

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·15d ago·source ↗

PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning in MLLMs

PaSBench-Video is a 740-video benchmark designed to evaluate whether multimodal large language models can issue timely, accurate safety warnings during the window between a visible danger sign and an accident. Videos span four domains (driving, healthcare, daily life, industrial production) and are annotated with frame-level risk onset and accident boundaries, requiring causal temporal reasoning rather than static scene classification. Testing 13 MLLMs reveals no model exceeds 20% on the strictest metric, with recall strongly coupled to false-positive rate (Pearson r=0.64), indicating models rely on scene-level activity cues rather than genuine hazard reasoning. Performance varies sharply by domain, with driving being particularly problematic due to visual similarity between routine and hazardous scenes.