5Hugging Face Blog·1mo ago

Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

H Company has released Holo1, a new family of vision-language models specifically designed for GUI automation tasks. These models power Surfer-H, a GUI agent capable of interacting with graphical interfaces. The release represents a specialized VLM family targeting the agent-tool ecosystem for desktop/web automation. Details on architecture, training data, and benchmarks are expected in the accompanying blog post.

Agent and Tool Ecosystem Multimodal Progress Surfer-H Hugging Face Holo1 H Company

Related guides (3)

Hugging Face

Hugging Face: The Home of Open-Source AI

Read asBeginner In-depth

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

6Hugging Face Blog·18d ago·source ↗

H Company releases Holo3.1: fast local computer use agent model

H Company published a Hugging Face blog post announcing Holo3.1, a model designed for computer use agents that runs locally. The release targets fast, on-device computer control tasks, positioning it in the growing space of open/local agentic models. The body content is minimal, but the announcement signals a new entrant in the local computer-use agent category.

Open Weights Progress Agent and Tool Ecosystem Holo3.1 H Company

5Hugging Face Blog·1mo ago·source ↗

H Company's Holo2 235B-A22B Model Leads in UI Localization

H Company has released Holo2, a 235B parameter mixture-of-experts model with 22B active parameters, announced via the Hugging Face blog. The model is positioned as a leader in UI localization tasks, suggesting a focus on agent-oriented or multimodal UI understanding capabilities. The post appears to be a product/model introduction from H Company, a relatively newer AI lab.

Frontier Model Releases Agent and Tool Ecosystem Hugging Face Holo2 H Company +1 more

5Hugging Face Blog·1mo ago·source ↗

smolagents Now Supports Vision-Language Models

Hugging Face has added vision-language model (VLM) support to its smolagents framework, enabling agents to process and reason over visual inputs alongside text. This update extends the agentic tooling ecosystem to multimodal workflows. The announcement comes from the Hugging Face blog, which serves as the primary communication channel for the smolagents project.

Agent and Tool Ecosystem Multimodal Progress Vision-Language Models Hugging Face smolagents

5Hugging Face Blog·1mo ago·source ↗

Smol2Operator: Post-Training GUI Agents for Computer Use

Hugging Face published a blog post introducing Smol2Operator, a post-training approach for building GUI agents capable of computer use tasks. The work focuses on training small language models to operate graphical user interfaces, extending the SmolLM2 model family into the agent/computer-use domain. The post likely covers training methodology, datasets, and evaluation of the resulting GUI agent capabilities.

Open Weights Progress Agent and Tool Ecosystem Smol2Operator SmolLM2 Hugging Face +1 more

5Hugging Face Blog·1mo ago·source ↗

Holotron-12B - High Throughput Computer Use Agent

Hcompany has released Holotron-12B, a 12-billion parameter model designed for computer use agent tasks with a focus on high throughput. The model is announced via the Hugging Face blog, suggesting it is available or soon available on the platform. Details on architecture, benchmarks, and capabilities are not present in the provided body text.

Frontier Model Releases Inference Economics Hugging Face Hcompany Holotron-12B +1 more

6arXiv · cs.CL·10d ago·source ↗

HiViG: History-aware visually grounded critic improves computer use agents across GUI benchmarks

Researchers introduce HiViG, a test-time framework for Computer Use Agents that addresses two weaknesses in existing critic models: short-sighted decision loops and lack of visual grounding. The system trains a multimodal critic on real GUI trajectories to maintain a compact macro-action history and verify execution coordinates against live screenshots before action execution. Evaluated on web, mobile, and desktop benchmarks, HiViG improves average success rates by 5.8% over the strongest baseline with Qwen3-VL-32B and 9.0% with Gemini-3-Flash, with both history and grounding components shown to be independently necessary.

Evaluation and Benchmarking Agent and Tool Ecosystem HiViG A History-Aware Visually Grounded Critic for Computer Use Agents Gemini 3 Flash +2 more

6Hugging Face Blog·1mo ago·source ↗

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

TII UAE has released Falcon-H1, a new family of hybrid-head language models combining attention and state-space mechanisms to improve efficiency and performance. The models are published on Hugging Face and represent TII's latest iteration in the Falcon series. The hybrid architecture targets better inference economics and competitive benchmark results relative to model size.

Frontier Model Releases Open Weights Progress Hugging Face Hybrid-Head Architecture Falcon-H1 +2 more

5Hugging Face Blog·1mo ago·source ↗

ScreenSuite: Comprehensive Evaluation Suite for GUI Agents

Hugging Face has released ScreenSuite, described as the most comprehensive evaluation suite for GUI (Graphical User Interface) agents. The suite aims to standardize and broaden benchmarking for agents that interact with visual interfaces. This addresses a gap in the evaluation ecosystem for screen-based AI agents, which are increasingly relevant as agentic systems expand into desktop and web automation tasks.

Evaluation and Benchmarking Agent and Tool Ecosystem GUI Agents ScreenSuite Hugging Face