Entity · paper

A History-Aware Visually Grounded Critic for Computer Use Agents

paperactivea-history-aware-visually-grounded-critic-for-computer-use-agents-0f9a6c09·1 events·first seen Jun 10, 2026

Aliases: A History-Aware Visually Grounded Critic for Computer Use Agents

Co-occurring entities

More like this (12)

Uncertainty Quantification for Computer-Use Agents: A Benchmark across Vision-Language Models and GUI Grounding Datasets Cognitive-structured Multimodal Agent The Blind Curator: How a Biased Judge Silently Disables Skill Retirement in Self-Evolving Agents GUI Agents Computer-Using Agent Symbolic Geometric Agent Learning Red Agent Policy from Observations for Neurosymbolic Autonomous Cyber Agents Semantic Agent Visually Grounded Self-Reflection for Vision-Language Models via Reinforcement Learning How AI Agents Reshape Knowledge Work: Autonomy, Efficiency, and Scope Rethinking Inference-Time Scaling in Local Computer-Use Agents: Failure Modes and Compute Tradeoffs tool-augmented language agents

Recent events (1)

6arXiv · cs.CL·Jun 10, 2026·source ↗

HiViG: History-aware visually grounded critic improves computer use agents across GUI benchmarks

Researchers introduce HiViG, a test-time framework for Computer Use Agents that addresses two weaknesses in existing critic models: short-sighted decision loops and lack of visual grounding. The system trains a multimodal critic on real GUI trajectories to maintain a compact macro-action history and verify execution coordinates against live screenshots before action execution. Evaluated on web, mobile, and desktop benchmarks, HiViG improves average success rates by 5.8% over the strongest baseline with Qwen3-VL-32B and 9.0% with Gemini-3-Flash, with both history and grounding components shown to be independently necessary.

Evaluation and Benchmarking Agent and Tool Ecosystem HiViG A History-Aware Visually Grounded Critic for Computer Use Agents Gemini 3 Flash +2 more