5Hugging Face Blog·1mo ago

Smol2Operator: Post-Training GUI Agents for Computer Use

Hugging Face published a blog post introducing Smol2Operator, a post-training approach for building GUI agents capable of computer use tasks. The work focuses on training small language models to operate graphical user interfaces, extending the SmolLM2 model family into the agent/computer-use domain. The post likely covers training methodology, datasets, and evaluation of the resulting GUI agent capabilities.

Open Weights Progress Agent and Tool Ecosystem Alignment and RLHF Smol2Operator SmolLM2 Hugging Face

Related guides (4)

Hugging Face

Hugging Face: The Home of Open-Source AI

Read asBeginner In-depth

Open Weights ProgressTopic guide

Open Weights Progress: How Freely Available AI Models Caught Up to the Frontier

Read asBeginner

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How the Infrastructure Layer Around LLMs Is Consolidating

Read asIn-depth

Alignment and RLHFTopic guide

Alignment and RLHF: Teaching AI Models to Behave

Read asBeginner In-depth

Related events (8)

5Hugging Face Blog·1mo ago·source ↗

smolagents Now Supports Vision-Language Models

Hugging Face has added vision-language model (VLM) support to its smolagents framework, enabling agents to process and reason over visual inputs alongside text. This update extends the agentic tooling ecosystem to multimodal workflows. The announcement comes from the Hugging Face blog, which serves as the primary communication channel for the smolagents project.

Agent and Tool Ecosystem Multimodal Progress Vision-Language Models Hugging Face smolagents

5Hugging Face Blog·1mo ago·source ↗

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

Hugging Face introduces SmolVLA, a compact Vision-Language-Action model designed for robotics control, trained on community-contributed data from the LeRobot ecosystem. The model targets efficient deployment on resource-constrained hardware while maintaining competitive manipulation performance. This release represents a continuation of Hugging Face's strategy to democratize robotics AI through open community data pipelines.

Open Weights Progress Agent and Tool Ecosystem LeRobot Vision-Language-Action model Hugging Face +2 more

5Hugging Face Blog·1mo ago·source ↗

SmolVLM2: Bringing Video Understanding to Every Device

Hugging Face introduces SmolVLM2, a family of compact vision-language models designed for video understanding on resource-constrained devices. The models extend the SmolVLM line with video comprehension capabilities while maintaining small footprints suitable for edge and on-device deployment. The release targets democratizing multimodal video understanding beyond cloud-only inference.

Open Weights Progress Inference Economics SmolVLM SmolVLM2 Hugging Face +1 more

5Hugging Face Blog·1mo ago·source ↗

Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

H Company has released Holo1, a new family of vision-language models specifically designed for GUI automation tasks. These models power Surfer-H, a GUI agent capable of interacting with graphical interfaces. The release represents a specialized VLM family targeting the agent-tool ecosystem for desktop/web automation. Details on architecture, training data, and benchmarks are expected in the accompanying blog post.

Agent and Tool Ecosystem Multimodal Progress Surfer-H Hugging Face Holo1 +1 more

5Hugging Face Blog·1mo ago·source ↗

SmolLM: Hugging Face Releases Blazingly Fast Small Language Models

Hugging Face introduces SmolLM, a family of small language models designed for on-device and edge deployment with high speed and competitive performance. The models are positioned as efficient alternatives for resource-constrained environments. The release includes model weights and associated tooling on the Hugging Face Hub.

Frontier Model Releases Open Weights Progress SmolLM Hugging Face +1 more

6Hugging Face Blog·1mo ago·source ↗

Introducing smolagents: simple agents that write actions in code

Hugging Face has released smolagents, a lightweight agent framework where agents express actions as executable Python code rather than structured JSON tool calls. The library is designed for simplicity and composability, allowing agents to chain tool calls and manipulate outputs programmatically within a single code block. The release positions smolagents as a minimal alternative to heavier orchestration frameworks, with native integration into the Hugging Face ecosystem.

Inference Economics Agent and Tool Ecosystem code-as-action agents Hugging Face smolagents

5Hugging Face Blog·1mo ago·source ↗

SmolVLM - Small Yet Mighty Vision Language Model

Hugging Face introduces SmolVLM, a compact vision-language model designed to deliver strong multimodal performance at small parameter counts. The model targets edge and resource-constrained deployment scenarios while maintaining competitive capabilities relative to its size. The announcement highlights efficiency improvements in both training and inference for small-scale VLMs.

Open Weights Progress Inference Economics SmolVLM Hugging Face +1 more

4Hugging Face Blog·1mo ago·source ↗

DeepMath: A Lightweight Math Reasoning Agent with smolagents

Hugging Face published a blog post introducing DeepMath, a lightweight mathematical reasoning agent built on the smolagents framework. The post demonstrates how to construct a capable math reasoning agent using small models and tool-use patterns. This represents a practical application of the agent-tool ecosystem for specialized reasoning tasks.

Inference Economics Agent and Tool Ecosystem Hugging Face DeepMath smolagents +1 more