Mahalanobis distance
mahalanobis-distance-f2f8b5d2·1 events·first seen 9h agoAliases: Mahalanobis distance
Co-occurring entities
More like this (12)
Recent events (1)
Argus benchmark evaluates uncertainty quantification methods for computer-use GUI agents across VLMs and datasets
Researchers introduce Argus, a cross-regime benchmark for post-hoc uncertainty quantification (UQ) in single-step GUI grounding agents, covering 27 methods across 4 open-weight VLMs and 4 datasets, plus an 8-method closed-source matrix across 3 frontier vendors. The central finding is 'selective transfer': UQ rankings are stable across datasets for a fixed model but degrade across model classes and observable interfaces, with cross-tier transfer to closed-source vendors averaging only +0.08 Spearman correlation. Hidden-state and density methods prove most stable for open-weight models, while conformal click regions reveal that score-level discrimination alone is insufficient for deployment safety. The benchmark releases per-item records and analysis scripts to support regime-aware UQ selection in GUI agents.