Entity · benchmark

SpatialWorld

benchmarkactivespatialworld-54d344ee·1 events·first seen Jun 9, 2026

Aliases: SpatialWorld

Co-occurring entities

More like this (12)

Meta-World OSWorld ScienceWorld DevicesWorld WorldString Word2World WorldEvolver OpenSpace DeepSWIP Mental World Modeling stable-worldmodel World from Motion

Recent events (1)

6arXiv · cs.CL·Jun 9, 2026·source ↗

SpatialWorld benchmark evaluates interactive spatial reasoning of multimodal agents in real-world tasks

Researchers introduce SpatialWorld, a benchmark for evaluating interactive spatial understanding of multimodal agents across 760 human-annotated tasks spanning household, travel, and social domains. The benchmark integrates eight simulation backends under a shared protocol, requiring agents to operate under vision-only partial observability with egocentric inputs. Evaluation of 15 agents reveals that even the strongest model, GPT-5, achieves only 17.4% task success rate, exposing significant gaps in active exploration and long-horizon planning. The work highlights a mismatch between task success and execution efficiency as a key bottleneck for spatial agents.

Evaluation and Benchmarking Agent and Tool Ecosystem SpatialWorld OpenAI Qwen 3.5 +2 more