Entity · other

vision-language grounding

otheractivevision-language-grounding-dedbbeed·1 events·first seen May 27, 2026

Aliases: vision-language grounding

Co-occurring entities

Parallel Box Decoding IoU LocateAnything

More like this (12)

visual language model LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories Visually Grounded Self-Reflection for Vision-Language Models via Reinforcement Learning Vision-Language Models Vision-Language-Action models referential grounding Vision-Language-Action model contrastive vision-language pretraining GUI grounding language-aware adapter heads LANG A Learning-Rate-Gated Failure of GRPO in a Small Language and Vision-Language Model Web Agent

Recent events (1)

6arXiv · cs.LG·May 27, 2026·source ↗

LocateAnything: Parallel Box Decoding for Fast and Accurate Vision-Language Grounding

LocateAnything introduces Parallel Box Decoding (PBD), a method that decodes bounding boxes and points as atomic units in a single step rather than sequentially token-by-token, improving both throughput and geometric coherence in visual grounding tasks. The framework is paired with a large-scale data engine producing LocateAnything-Data, a 138-million-sample training dataset for high-precision localization. Evaluations show advances on the speed-accuracy frontier across diverse grounding and detection benchmarks. The work addresses a fundamental architectural mismatch in how current VLMs handle 2D spatial coordinates.

Evaluation and Benchmarking Inference Economics Parallel Box Decoding IoU vision-language grounding +2 more