Entity · benchmark

PaSBench-Video

benchmarkactivepasbench-video-2db153d8·1 events·first seen Jun 2, 2026

Aliases: PaSBench-Video

Co-occurring entities

Multimodal Large Language Models Pearson correlation

More like this (12)

SPBench APS-Bench ProgramBench WSADBench VSI-Bench GraphVid-Bench ParaPairAudioBench VR-Bench PowerCodeBench SAM Audio-Bench SorryBench RepoBench

Recent events (1)

6arXiv · cs.CL·Jun 2, 2026·source ↗

PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning in MLLMs

PaSBench-Video is a 740-video benchmark designed to evaluate whether multimodal large language models can issue timely, accurate safety warnings during the window between a visible danger sign and an accident. Videos span four domains (driving, healthcare, daily life, industrial production) and are annotated with frame-level risk onset and accident boundaries, requiring causal temporal reasoning rather than static scene classification. Testing 13 MLLMs reveals no model exceeds 20% on the strictest metric, with recall strongly coupled to false-positive rate (Pearson r=0.64), indicating models rely on scene-level activity cues rather than genuine hazard reasoning. Performance varies sharply by domain, with driving being particularly problematic due to visual similarity between routine and hazardous scenes.

Evaluation and Benchmarking AI Safety Research Multimodal Large Language Models PaSBench-Video Pearson correlation +1 more