benchmark

HarmVideoBench

benchmarkactiveprovisionalharmvideobench-1d497347·1 events·first seen 7d ago

Aliases: HarmVideoBench

Co-occurring entities

More like this (12)

HarmBench SorryBench AdvBench LiveBench AdversaBench TriViewBench VR-Bench RepoBench PhantomBench RoleBench TriggerBench VisAnomBench

Recent events (1)

5arXiv · cs.CL·7d ago·source ↗

HarmVideoBench: Multi-layered benchmark for harmful video understanding in large multimodal models

Researchers introduce HarmVideoBench, a diagnostic benchmark of 1,379 videos paired with 4,137 multiple-choice questions designed to evaluate harmful video understanding across three hierarchical dimensions: Observable Evidence, Clip-Internal Meaning, and Beyond-Clip Reasoning. The benchmark addresses limitations in existing work by moving beyond binary classification and requiring explanatory rationales. The authors evaluate 19 leading models and introduce BCR, a method that dynamically retrieves context based on predicted reasoning boundaries, improving macro average performance from 61.7% to 84.4%.

Evaluation and Benchmarking AI Safety Research HarmVideoBench BCR +1 more