product
SpatialClaw
productactiveprovisional
spatialclaw-8db6683d·1 events·first seen 5d agoAliases: SpatialClaw
More like this (12)
Recent events (1)
SpatialClaw: Code-as-action interface for agentic 3D/4D spatial reasoning with VLMs
SpatialClaw is a training-free framework that uses code execution as the action interface for vision-language model agents performing spatial reasoning tasks. The system maintains a stateful Python kernel with perception and geometry primitives, allowing the VLM to write iterative executable cells conditioned on prior outputs rather than committing to a full strategy upfront. Evaluated across 20 spatial reasoning benchmarks covering static and dynamic 3D/4D tasks, SpatialClaw achieves 59.9% average accuracy, outperforming the prior state-of-the-art spatial agent by +11.2 points across six VLM backbones.