BAGEL
bagel-7e1b8700·1 events·first seen 13d agoAliases: BAGEL
Co-occurring entities
More like this (12)
Recent events (1)
Imaginative Perception Tokens improve spatial reasoning in vision-language models
Researchers introduce Imaginative Perception Tokens (IPT), intermediate perceptual representations that externalize what a VLM would perceive from alternative spatial viewpoints, enabling reasoning about unobserved spatial structure. The approach is evaluated on three new tasks—Perspective Taking, Path Tracing, and Multiview Counting—using ~20K examples built on the BAGEL backbone. IPT supervision consistently outperforms textual chain-of-thought training for spatial tasks, with the authors finding that forcing spatial computation through language can degrade performance, suggesting a modality mismatch. The work provides both a practical supervision technique and a diagnostic finding about the limits of language-mediated spatial reasoning.