other
vision-language grounding
otheractiveprovisional
vision-language-grounding-dedbbeed·1 events·first seen 21d agoAliases: vision-language grounding
Co-occurring entities
More like this (12)
visual language modelLabVLA: Grounding Vision-Language-Action Models in Scientific LaboratoriesVision-Language ModelsVision-Language-Action modelsreferential groundingVision-Language-Action modelcontrastive vision-language pretrainingGUI groundinglanguage-aware adapter headsLANGlanguage-adaptive switchConnecting Speech to Words through Images
Recent events (1)
LocateAnything: Parallel Box Decoding for Fast and Accurate Vision-Language Grounding
LocateAnything introduces Parallel Box Decoding (PBD), a method that decodes bounding boxes and points as atomic units in a single step rather than sequentially token-by-token, improving both throughput and geometric coherence in visual grounding tasks. The framework is paired with a large-scale data engine producing LocateAnything-Data, a 138-million-sample training dataset for high-precision localization. Evaluations show advances on the speed-accuracy frontier across diverse grounding and detection benchmarks. The work addresses a fundamental architectural mismatch in how current VLMs handle 2D spatial coordinates.