Almanac
technique

Imaginative Perception Tokens

techniqueactiveprovisionalimaginative-perception-tokens-a8d294fe·1 events·first seen 13d ago

Aliases: Imaginative Perception Tokens

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·13d ago·source ↗

Imaginative Perception Tokens improve spatial reasoning in vision-language models

Researchers introduce Imaginative Perception Tokens (IPT), intermediate perceptual representations that externalize what a VLM would perceive from alternative spatial viewpoints, enabling reasoning about unobserved spatial structure. The approach is evaluated on three new tasks—Perspective Taking, Path Tracing, and Multiview Counting—using ~20K examples built on the BAGEL backbone. IPT supervision consistently outperforms textual chain-of-thought training for spatial tasks, with the authors finding that forcing spatial computation through language can degrade performance, suggesting a modality mismatch. The work provides both a practical supervision technique and a diagnostic finding about the limits of language-mediated spatial reasoning.