Entity · benchmark

VideoMME

benchmarkactivevideomme-85fcce66·2 events·first seen May 26, 2026

Aliases: VideoMME

Co-occurring entities

OmniAgent Qwen2.5-VL-72B LVBench Native Active Perception as Reasoning for Omni-Modal Understanding TAURA MVBench STORMS TempCompass Chain-of-Thought Reasoning MMVU latent chain-of-thought

More like this (12)

VideoMV MMVU MMMU RoboMME MM-EPC MMAE MMMU-Pro MMSU VideoMLA Moment-Video MAMS MMS (Massively Multilingual Speech)

Recent events (2)

6arXiv · cs.CL·Jun 18, 2026·source ↗

OmniAgent: POMDP-based active perception agent for long video understanding with test-time scaling

Researchers introduce OmniAgent, a multimodal agent that reformulates long video understanding as a POMDP-based iterative Observation-Thought-Action cycle, selectively distilling audio-visual cues into persistent textual memory rather than processing all frames uniformly. The system uses Agentic Supervised Fine-Tuning and a novel reinforcement learning method (TAURA) with turn-level entropy for credit assignment. OmniAgent demonstrates positive test-time scaling and achieves state-of-the-art open-source results across ten benchmarks, with its 7B model outperforming Qwen2.5-VL-72B on LVBench (50.5% vs. 47.3%).

Inference Economics Agent and Tool Ecosystem OmniAgent Qwen2.5-VL-72B LVBench +4 more

6arXiv · cs.CL·May 26, 2026·source ↗

STORM: Internalized Spatial-Temporal Reasoning for Video-Language Models via Latent Trajectories

STORMS is a two-stage training framework that teaches large vision-language models to perform spatial-temporal video reasoning through bounded continuous latent trajectories rather than explicit textual chain-of-thought, keyframe selection, or external tool use. In Stage I, latent tokens are aligned with thought-video representations derived from generated videos; in Stage II, answer-only supervision internalizes the reasoning process. At inference time, no video regeneration or frame reinsertion is required, reducing latency and engineering complexity. Evaluations on VideoMME, MVBench, TempCompass, and MMVU show improved accuracy with substantially lower inference overhead versus tool-based pipelines.

Inference Economics Agent and Tool Ecosystem MVBench STORMS TempCompass +5 more