Entity · technique

Pearson correlation

techniqueactivepearson-correlation-1a8dd62b·1 events·first seen Jun 2, 2026

Aliases: Pearson correlation

Co-occurring entities

Multimodal Large Language Models PaSBench-Video

More like this (12)

Clopper-Pearson Spearman Rank Correlation canonical correlation analysis Clopper-Pearson confidence intervals Spearman's rho Fisher score Coefficient of Variation (CV)D-Score Exact Posterior Score Cohen's d difference-in-means Q-statistic

Recent events (1)

6arXiv · cs.CL·Jun 2, 2026·source ↗

PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning in MLLMs

PaSBench-Video is a 740-video benchmark designed to evaluate whether multimodal large language models can issue timely, accurate safety warnings during the window between a visible danger sign and an accident. Videos span four domains (driving, healthcare, daily life, industrial production) and are annotated with frame-level risk onset and accident boundaries, requiring causal temporal reasoning rather than static scene classification. Testing 13 MLLMs reveals no model exceeds 20% on the strictest metric, with recall strongly coupled to false-positive rate (Pearson r=0.64), indicating models rely on scene-level activity cues rather than genuine hazard reasoning. Performance varies sharply by domain, with driving being particularly problematic due to visual similarity between routine and hazardous scenes.

Evaluation and Benchmarking AI Safety Research Multimodal Large Language Models PaSBench-Video Pearson correlation +1 more