HAT-4D
hat-4d-d9558bd9·1 events·first seen 17h agoAliases: HAT-4D
Co-occurring entities
More like this (12)
Recent events (1)
HAT-4D: Agentic framework for 4D multi-object interaction reconstruction from monocular video
HAT-4D is a new agentic framework that reconstructs 3D geometry, temporal dynamics, and physical interactions of multiple objects from single monocular videos, targeting scalable data collection for Embodied AI and Vision-Language-Action (VLA) model training. The system integrates VLMs with a multi-level human-in-the-loop feedback mechanism to resolve depth ambiguities and occlusions without expensive multi-camera rigs. The authors also introduce MVOIK-4D, an open-world benchmark for monocular 4D interaction reconstruction with a novel evaluation protocol focused on physical plausibility and temporal consistency. Experiments show state-of-the-art performance on most metrics, and HAT-4D-generated data improves downstream model fine-tuning.