benchmark
SpatialWorld
benchmarkactiveprovisional
spatialworld-54d344ee·1 events·first seen 8d agoAliases: SpatialWorld
Co-occurring entities
More like this (12)
Recent events (1)
SpatialWorld benchmark evaluates interactive spatial reasoning of multimodal agents in real-world tasks
Researchers introduce SpatialWorld, a benchmark for evaluating interactive spatial understanding of multimodal agents across 760 human-annotated tasks spanning household, travel, and social domains. The benchmark integrates eight simulation backends under a shared protocol, requiring agents to operate under vision-only partial observability with egocentric inputs. Evaluation of 15 agents reveals that even the strongest model, GPT-5, achieves only 17.4% task success rate, exposing significant gaps in active exploration and long-horizon planning. The work highlights a mismatch between task success and execution efficiency as a key bottleneck for spatial agents.