Almanac
technique

Pearson correlation

techniqueactiveprovisionalpearson-correlation-1a8dd62b·1 events·first seen 15d ago

Aliases: Pearson correlation

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·15d ago·source ↗

PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning in MLLMs

PaSBench-Video is a 740-video benchmark designed to evaluate whether multimodal large language models can issue timely, accurate safety warnings during the window between a visible danger sign and an accident. Videos span four domains (driving, healthcare, daily life, industrial production) and are annotated with frame-level risk onset and accident boundaries, requiring causal temporal reasoning rather than static scene classification. Testing 13 MLLMs reveals no model exceeds 20% on the strictest metric, with recall strongly coupled to false-positive rate (Pearson r=0.64), indicating models rely on scene-level activity cues rather than genuine hazard reasoning. Performance varies sharply by domain, with driving being particularly problematic due to visual similarity between routine and hazardous scenes.