organization
baulab
organizationactiveprovisional
baulab-25464d4c·1 events·first seen 2d agoAliases: baulab
Co-occurring entities
More like this (12)
Recent events (1)
Gaze Heads: Attention heads in VLMs that track and control image region description
Researchers identify a small set of attention heads in vision-language model backbones, called 'gaze heads', whose attention patterns track the image region currently being described. Using comic strips as a controlled testbed, they show that intervening on the top-100 gaze heads (fewer than 9% of all heads) can steer the model to describe any chosen region at 83.1% accuracy, without retraining. The mechanism generalizes across model sizes from 2B to 32B parameters and to natural images (COCO), establishing a practical inference-time control lever for multimodal models via mechanistic analysis.