Almanac
benchmark

SpaceNum

benchmarkactiveprovisionalspacenum-e182394c·1 events·first seen 22d ago

Aliases: SpaceNum

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.AI·22d ago·source ↗

SPACENUM: Revisiting Spatial Numerical Understanding in VLMs

SpaceNum is a new evaluation framework probing whether Vision-Language Models genuinely ground numerical outputs (coordinates, action magnitudes) in spatial perception, rather than relying on shallow cues. The benchmark defines two bidirectional tasks—Num2Space and Space2Num—across dynamic and static spatial settings. Results show current VLMs perform near random chance on spatial numerical grounding, with explicit reasoning providing only marginal improvement and fine-tuning offering partial gains.