Entity · benchmark

MVBench

benchmarkactivemvbench-b4e3d6fd·1 events·first seen May 26, 2026

Aliases: MVBench

Co-occurring entities

STORMS TempCompass Chain-of-Thought Reasoning MMVU VideoMME latent chain-of-thought

More like this (12)

VBench LVBench MMBench2 VerifierBench MTBench MemBench VR-Bench VBench I2V AdvBench VLABench VisAnomBench IVEBench

Recent events (1)

6arXiv · cs.CL·May 26, 2026·source ↗

STORM: Internalized Spatial-Temporal Reasoning for Video-Language Models via Latent Trajectories

STORMS is a two-stage training framework that teaches large vision-language models to perform spatial-temporal video reasoning through bounded continuous latent trajectories rather than explicit textual chain-of-thought, keyframe selection, or external tool use. In Stage I, latent tokens are aligned with thought-video representations derived from generated videos; in Stage II, answer-only supervision internalizes the reasoning process. At inference time, no video regeneration or frame reinsertion is required, reducing latency and engineering complexity. Evaluations on VideoMME, MVBench, TempCompass, and MMVU show improved accuracy with substantially lower inference overhead versus tool-based pipelines.

Inference Economics Agent and Tool Ecosystem MVBench STORMS TempCompass +5 more