4Simon Willison's Weblog·3d ago

Simon Willison demonstrates using shot-scraper video for agent workflow recording

Simon Willison describes a technique for having AI agents record video demonstrations of their browser-based work using the shot-scraper video tool. The approach enables automated capture of agent activity for debugging, documentation, or demonstration purposes. This is a practical tooling pattern relevant to anyone building or evaluating web-browsing agents.

Agent and Tool Ecosystem shot-scraper Simon Willison

Related guides (2)

Simon Willison

Simon Willison: The Practitioner's Guide to the AI Landscape

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

4arXiv · cs.AI·4d ago·source ↗

Rhetor: Multi-agent system for rehearsed live product demos with real-time voice Q&A

Researchers introduce Rhetor, a multi-agent system that automates live software product demonstrations by taking a running web application and its source code as input, then producing a rehearsed demo with synchronized narration and real-time voice question answering. The system combines UI exploration with source-code analysis, uses semantic locators for browser action dispatch, and includes a pre-presentation rehearsal loop with graceful degradation. Evaluated across six pipeline sessions on four deployed applications, the system achieves high locator-firing rates (sigma-bar ~0.92 on a 53-action workload) and converges to perfect locator resolution on a public-domain reference app. The paper also proposes a ten-metric benchmark protocol for evaluating demo automation systems.

Enterprise Deployment Patterns Agent and Tool Ecosystem Excalidraw Rhetor

4Simon Willison'S Weblog·Jun 13, 2026·source ↗

Simon Willison adds document context to OpenAI WebRTC Audio Session tool

Simon Willison documents an update to his OpenAI WebRTC Audio Session tool that adds document context capabilities, allowing audio sessions to incorporate document content. The post covers practical integration of OpenAI's real-time audio API with document-grounded context. This is a hands-on tooling walkthrough relevant to practitioners building voice-enabled AI applications.

Agent and Tool Ecosystem Simon Willison OpenAI OpenAI WebRTC Audio Session

5arXiv · cs.CL·2d ago·source ↗

Scalable behaviour cloning for browser agents via skill distillation from human interaction traces

A new arXiv preprint proposes converting human browser interaction trajectories into compact natural-language skills that agents can retrieve and compose, arguing that the bottleneck for browser agents is decision-making under incomplete information rather than low-level operations. The approach organizes distilled skills into a skill graph to enable consolidation rather than unbounded accumulation. The work positions collective human browsing behavior as a scalable, under-exploited source of reusable agent priors, potentially reducing reliance on manually designed task demonstrations.

Evaluation and Benchmarking Agent and Tool Ecosystem einsia.ai Scalable Behaviour Cloning on Browser Using via Skill Distillation

5Simon Willison'S Weblog·Jun 22, 2026·source ↗

Simon Willison on Temporary Cloudflare Accounts for AI Agents

Simon Willison covers a Cloudflare feature enabling temporary accounts for AI agents, which allows agents to provision and use cloud resources ephemerally. The post highlights an emerging infrastructure pattern where AI agents are granted scoped, time-limited credentials rather than persistent access. This is relevant to the agent-tool ecosystem as it addresses identity and resource management for autonomous agents.

Inference Economics Agent and Tool Ecosystem Simon Willison Cloudflare

5Hugging Face Blog·May 19, 2026·source ↗

ScreenSuite: Comprehensive Evaluation Suite for GUI Agents

Hugging Face has released ScreenSuite, described as the most comprehensive evaluation suite for GUI (Graphical User Interface) agents. The suite aims to standardize and broaden benchmarking for agents that interact with visual interfaces. This addresses a gap in the evaluation ecosystem for screen-based AI agents, which are increasingly relevant as agentic systems expand into desktop and web automation tasks.

Evaluation and Benchmarking Agent and Tool Ecosystem GUI Agents ScreenSuite Hugging Face

5Simon Willison'S Weblog·32h ago·source ↗

Simon Willison uses DSPy to evaluate and optimize Datasette Agent SQL system prompts

Simon Willison documents an experiment using DSPy to systematically evaluate and improve the SQL system prompts used by Datasette Agent. The post covers applying DSPy's prompt optimization framework to a real-world agentic tool, demonstrating a practical workflow for automated prompt engineering. This is a hands-on practitioner account of using DSPy for prompt evaluation in a production-adjacent context.

Evaluation and Benchmarking Agent and Tool Ecosystem DSPy Simon Willison Datasette

5Github Trending·6d ago·source ↗

video-use: open-source library for editing videos with coding agents

browser-use/video-use is a Python library enabling AI coding agents to edit videos programmatically, accumulating over 10,000 GitHub stars with strong daily momentum (+216). The project extends the browser-use agent paradigm to video editing workflows. High star count signals significant community interest in agent-driven media manipulation tooling.

Agent and Tool Ecosystem video-use Browser-Use

3Simon Willison'S Weblog·Jun 9, 2026·source ↗

Simon Willison on setting custom model prices in AgentsView

Simon Willison documents a workflow for configuring custom pricing for models within AgentsView, a tool for tracking AI agent costs. The post addresses a practical need for practitioners who use models not yet priced in the tool's default database. It is a short how-to from a tier-2 commentary source with minimal body content available.

Inference Economics Agent and Tool Ecosystem Simon Willison AgentsView