Entity · technique

sparse frame sampling

techniqueactivesparse-frame-sampling-2f343da0·1 events·first seen Jun 2, 2026

Aliases: sparse frame sampling

Co-occurring entities

Multimodal Large Language Models Moment-Video Seed-2.0-Pro temporal visual event understanding visual-token compression

More like this (12)

defer-to-resample Graph Sparse Sampling Min-p sampling Thinkframes finite-sample posterior sampling framework Chain-of-Frame Adaptive Depth Sparse Framework intra-frame entropy-guided sparsification intra-frame token sparsification Force-Informed Re-Sampling Training inter-frame token selection Hyperframes

Recent events (1)

7arXiv · cs.AI·Jun 2, 2026·source ↗

Moment-Video: Benchmark Diagnosing Temporal Fidelity of Video MLLMs on Momentary Visual Events

Moment-Video is a new benchmark of 1,000 human-verified video-QA pairs designed to evaluate how well video multimodal large language models (MLLMs) handle brief, localized visual events that may span only a few frames. The benchmark covers 7 domains and 25 subcategories across four task types: Temporal Occurrence, Temporal Counting, Action Description, and Temporal Reasoning. Evaluation of 33 proprietary and open-source models reveals severe deficiencies: the best model (Seed-2.0-Pro) achieves only 39.6% accuracy, while most open-source models score below 25%. Diagnostic analyses show that denser frame sampling helps but does not resolve the bottleneck, pointing to fundamental limitations in how current video MLLMs represent and preserve transient visual evidence.

Long Context Evolution Evaluation and Benchmarking Multimodal Large Language Models Moment-Video Seed-2.0-Pro +4 more