paper
Native Active Perception as Reasoning for Omni-Modal Understanding
paperactiveprovisional
native-active-perception-as-reasoning-for-omni-modal-understanding-d95d1994·1 events·first seen 3d agoAliases: Native Active Perception as Reasoning for Omni-Modal Understanding
Co-occurring entities
More like this (12)
Reasoning as Pattern Matching: Shared Mechanisms in Human and LLM Everyday ReasoningAdaptive Parallel Reasoningspatio-temporal dynamic reasoningPragmatic ReasoningBeyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Modelshybrid reasoningReasoning Language Modelslatent reasoningLatent World Recovery for Multimodal Learning with Missing ModalitiesLong-context Reasoning BenchmarksNatural Language Inferencemulti-hop reasoning
Recent events (1)
OmniAgent: POMDP-based active perception agent for long video understanding with test-time scaling
Researchers introduce OmniAgent, a multimodal agent that reformulates long video understanding as a POMDP-based iterative Observation-Thought-Action cycle, selectively distilling audio-visual cues into persistent textual memory rather than processing all frames uniformly. The system uses Agentic Supervised Fine-Tuning and a novel reinforcement learning method (TAURA) with turn-level entropy for credit assignment. OmniAgent demonstrates positive test-time scaling and achieves state-of-the-art open-source results across ten benchmarks, with its 7B model outperforming Qwen2.5-VL-72B on LVBench (50.5% vs. 47.3%).