Entity · product

OpenClaw

productactiveopenclaw-c3fc4d2c·21 events·first seen May 18, 2026

Aliases: OpenClaw

Co-occurring entities

More like this (12)

SpatialClaw VulnClaw MaskClaw ClawHub PalmClaw Claw-Anything HealthClaw AutoResearchClaw ClawBench NemoClaw OpenML ClawBot

Recent events (21)

4arXiv · cs.LG·3d ago·source ↗

VetClaw: Edge-cloud multimodal agentic system for veterinary disease screening

VetClaw is a multimodal agentic system that combines an edge camera module with a server-hosted vision-language model for zero-shot veterinary disease classification. The architecture separates agent interaction (OpenClaw) from workflow orchestration (LangGraph), enabling stateful screening with safety checks, conditional routing, and failure handling. Results show that image-only VLM prediction is limited, but symptom-guided and multimodal inputs improve zero-shot classification performance. The paper demonstrates a practical pattern for deploying agentic AI pipelines in constrained, safety-sensitive edge-cloud settings.

Enterprise Deployment Patterns Agent and Tool Ecosystem OpenClaw LangGraph VetClaw

6arXiv · cs.CL·5d ago·source ↗

Nanbeige4.2-3B: A 3B-parameter agentic model outperforming 9B–12B models on agentic benchmarks

Researchers present Nanbeige4.2-3B, a 3-billion non-embedding parameter model designed for agentic tasks including code-agent, office-agent, and complex tool use. The model is pretrained on 28T tokens using a Looped Transformer architecture that reuses layer stacks to increase capacity without adding parameters, and trained with a multi-stage RL pipeline combining mixed-mode RLHF, length-controlled reasoning RL, and agentic RL with outcome and process rewards. Evaluations claim Nanbeige4.2-3B outperforms larger models including Qwen3.5-9B and Gemma4-12B on diverse agentic benchmarks while remaining competitive on reasoning tasks. The result contributes to the ongoing question of how much agentic capability can be packed into compact, locally-deployable models.

Open Weights Progress Inference Economics Gemma-4 E4B-it Looped Transformer OpenClaw +3 more

7arXiv · cs.CL·Jul 24, 2026·source ↗

OpenForgeRL: Open-source framework for end-to-end RL training of harness-native AI agents

OpenForgeRL is an open-source framework that enables end-to-end reinforcement learning training of AI agents operating within real inference harnesses (e.g., Claude Code, Codex, OpenClaw) and diverse environments. It uses a lightweight proxy to record harness model calls as training data for standard RL codebases like veRL, plus a Kubernetes orchestrator for scalable rollouts in isolated containers. Trained agents (OpenForgeClaw and OpenForgeGUI) achieve competitive results on benchmarks including ClawEval, OSWorld-Verified, Online-Mind2Web, and WebVoyager, matching or surpassing models several times larger in GUI tasks. The work also analyzes how harness choice and RL shape agent behavior, finding meaningful variation in learnability across harnesses.

Training Infrastructure Evaluation and Benchmarking QwenClawBench Online-Mind2Web veRL +11 more

5arXiv · cs.CL·Jul 15, 2026·source ↗

KnowAct-GUIClaw: Self-evolving memory and skill framework for cross-platform GUI agents

Researchers introduce KnowAct-GUIClaw, a Know-Route-Act-Reflect agent framework extending the OpenClaw agent system with cross-platform GUI interaction support across Android, iOS, HarmonyOS, and Windows. The system features an experience-attributable memory system and self-evolving skill library that continuously improves from user interaction history. Using open-source Kimi-2.6 models, the framework achieves 64.1% on the MobileWorld long-horizon benchmark, reportedly outperforming closed-source agentic models including GPT-5.5 and Seed-2.0-Pro. The transferable memory and skill components provide an 8.5% improvement when applied across different base models.

Evaluation and Benchmarking Agent and Tool Ecosystem OpenClaw Seed-2.0-Pro Kimi K2.6 +3 more

4arXiv · cs.LG·Jul 13, 2026·source ↗

Survey: LLMs for front-end EDA — HDL generation, testbench construction, and agentic design automation

A new arXiv survey reviews the application of large language models to front-end chip design tasks, including hardware description language (HDL) generation, testbench construction, and high-level synthesis optimization. The paper traces the evolution from LLMs as localized assistants toward autonomous agentic EDA systems, citing OpenClaw as a representative pioneering system. It identifies key challenges and outlines future opportunities for agentic AI in electronic design automation.

Agent and Tool Ecosystem OpenClaw

7The Batch·Jul 2, 2026·source ↗

U.S. lifts export controls on Claude Fable 5 and Mythos 5; Anthropic launches Claude Sonnet 5 and Claude Science platform

The Trump administration lifted export restrictions on Anthropic's Claude Fable 5 and Claude Mythos 5 after Anthropic committed to stronger safeguards, resolving a dispute over jailbreak vulnerabilities. Separately, Anthropic launched Claude Sonnet 5, a mid-tier agentic model priced at $2/$10 per million tokens through August 2026, and Claude Science, a unified research workbench for life sciences integrating PubMed, Jupyter, and HPC cluster access. The newsletter also covers Google's Nano Banana 2 Lite image model and Gemini Omni Flash video model, and Cognition's Devin Fusion multi-model routing system claiming 35% cost reduction versus GPT-5.5 and Opus 4.8.

Frontier Model Releases Inference Economics Dario Amodei Claude Sonnet 3.5 Claude Mythos +20 more

4Hugging Face Blog·Jun 23, 2026·source ↗

Hugging Face uses local models to automate PR triage on OpenClaw repository

Hugging Face published a blog post describing how they deployed local models to triage pull requests in the OpenClaw repository at no cost. The post demonstrates a practical agentic workflow for open-source repository maintenance using locally-run models. This is a concrete deployment case study for local model inference in software engineering automation tasks.

Open Weights Progress Enterprise Deployment Patterns OpenClaw Hugging Face +1 more

5arXiv · cs.CL·Jun 23, 2026·source ↗

MacAgentBench: New benchmark for AI agents on real-world macOS desktop tasks

MacAgentBench introduces a 676-task benchmark across 25 macOS applications designed to evaluate computer use agents (CUAs) with framework augmentation and fine-grained multi-checkpoint scoring, addressing gaps in existing binary-evaluation benchmarks. Nearly 60% of tasks involve both GUI and CLI interaction, and the benchmark tests 16 models across three agent frameworks. The best result — Claude Opus 4.6 on the OpenClaw framework — achieves 73.7% Pass@1, with performance gains attributed primarily to the skill library rather than framework design. Fine-grained metrics reveal that models with similar Pass@1 scores can differ substantially in sub-goal completion, highlighting limitations of coarse evaluation.

Evaluation and Benchmarking Agent and Tool Ecosystem Claude Opus 4.6 OpenClaw MacAgentBench +1 more

5arXiv · cs.CL·Jun 11, 2026·source ↗

Claw-SWE-Bench: A benchmark for evaluating agent harnesses on multilingual coding tasks

Researchers introduce Claw-SWE-Bench, a multilingual SWE-bench-style benchmark and adapter protocol designed to fairly compare heterogeneous agent harnesses ("claws") on GitHub issue-resolution tasks. The benchmark contains 350 instances across 8 languages and 43 repositories, with an 80-instance Lite subset for cost-efficient validation. Key findings show adapter design dominates raw model choice: a minimal adapter scores 19.1% Pass@1 versus 73.4% for a full adapter using the same GLM 5.1 backbone, and harness choice and model choice each shift Pass@1 by roughly 27-29 percentage points. The work also introduces cost accounting as a first-class evaluation axis alongside accuracy.

Evaluation and Benchmarking Inference Economics SWE-Bench Multilingual OpenClaw SWE-Bench Verified +4 more

4Github Trending·Jun 4, 2026·source ↗

NVIDIA NemoClaw: Secure agent execution inside NVIDIA OpenShell with managed inference

NVIDIA has published NemoClaw, a TypeScript project on GitHub for running AI agents such as Hermes and OpenClaw more securely inside NVIDIA OpenShell with managed inference. The repository has accumulated over 20,000 stars, suggesting notable community interest. The project appears to be part of NVIDIA's broader NeMo ecosystem for enterprise AI agent deployment.

Inference Economics Agent and Tool Ecosystem Hermes OpenClaw NVIDIA +2 more

5arXiv · cs.CL·Jun 3, 2026·source ↗

RealClawBench: Live benchmark framework built from real developer-agent sessions

RealClawBench is a new benchmark framework that converts real OpenClaw developer-agent sessions into reproducible, automatically scored evaluation tasks. It addresses realism gaps in existing agent benchmarks through reconstructed execution environments and deterministic verifiable scorers, releasing 281 executable tasks sampled to preserve the source session distribution. Evaluation of 14 contemporary models shows the best system solves only 65.8% of tasks, indicating substantial headroom on realistic developer-agent workloads.

Evaluation and Benchmarking Agent and Tool Ecosystem OpenClaw RealClawBench

6The Batch·Jun 3, 2026·source ↗

DeerFlow 2.0 launches as open-source agent harness; Anthropic sues Pentagon over AI blacklist; Google releases Gemini Embedding 2

ByteDance released DeerFlow 2.0, an open-source agent harness built on LangGraph/LangChain that orchestrates parallel sub-agents with sandboxed Docker environments, progressive skill-loading, and persistent memory for complex workflows. Anthropic filed two lawsuits against the U.S. Pentagon contesting a supply-chain risk blacklist tied to its refusal to remove guardrails preventing Claude's use in autonomous weapons and domestic surveillance, with potential multi-billion dollar revenue impact. Google released Gemini Embedding 2, a multimodal embedding model unifying text, images, video, audio, and PDFs in a single vector space, succeeding the text-only predecessor. Meta acquired Moltbook, an agent-to-agent social platform built around the OpenClaw framework, while OpenAI hired OpenClaw's creator and acquired AI security testing platform Promptfoo.

Regulatory Developments Agent and Tool Ecosystem Ben Parr Jeff Dean Gemini Embedding 2 +17 more

6The Batch·Jun 3, 2026·source ↗

Data Points: NemoClaw enterprise stack, GPT-5.4 mini/nano, Nemotron 3 Nano 4B, Midjourney V8, and Mamba-3

A multi-item roundup covers several AI developments: Nvidia unveiled NemoClaw at GTC 2026, an enterprise software stack integrating with OpenClaw to add security and governance for agentic deployments, with launch partners including Salesforce, Cisco, and CrowdStrike. OpenAI released GPT-5.4 mini and nano, smaller variants optimized for speed with benchmark results on SWE-Bench Pro and OSWorld-Verified, priced at $0.75 and $0.20 per million input tokens respectively. Nvidia also released Nemotron 3 Nano 4B, a hybrid Mamba-Transformer 4B parameter on-device model. Additional items cover Midjourney V8 alpha (5x faster, diffusion-only) and Mamba-3, a 1.5B state space model from CMU and Together.AI with improved accuracy over Mamba-2.

Frontier Model Releases Inference Economics Midjourney Mamba Carnegie Mellon University +19 more

7The Batch·Jun 2, 2026·source ↗

Data Points: OpenAI shuts down Sora, Anthropic multi-agent harness, EVA voice benchmark, Arm AGI CPU, White House AI preemption proposal

OpenAI is shutting down its Sora text-to-video platform without explanation, ending a major Disney licensing deal worth up to $1 billion and eliminating video capabilities from ChatGPT amid Hollywood copyright tensions. Anthropic published details on a multi-agent harness enabling Claude to build full-stack applications over multi-hour sessions using a planner-generator-evaluator architecture. ServiceNow AI Research released EVA, an open-source two-dimensional benchmark for voice agents measuring both task accuracy and conversational experience quality. Additional items cover Arm's first self-designed data center CPU (AGI CPU) co-developed with Meta, and the Trump Administration's legislative proposal for a federal AI framework that would preempt state AI laws.

Training Infrastructure Frontier Model Releases ServiceNow AI Research ClawBot Playwright +19 more

6The Batch·Jun 1, 2026·source ↗

Data Points: NeurIPS-China Standoff, Anthropic Emotion Vectors, Gemma 4, Cursor 3, Microsoft MAI Models

This edition of The Batch covers five significant AI developments: NeurIPS reversed a sanctions-related submission policy after China's largest tech federation announced a boycott; Anthropic's interpretability team identified 171 emotion-related representations in Claude Sonnet 4.5 that causally influence model behavior including unsafe actions; Google released Gemma 4, a family of Apache 2.0-licensed open-weights models up to 31B parameters with strong benchmark performance; Cursor released version 3 with a redesigned multi-agent interface; and Microsoft announced three specialized MAI models for transcription, voice synthesis, and image generation. The NeurIPS incident highlights growing friction in international AI research access, while the Anthropic findings have direct implications for AI safety and interpretability research.

Frontier Model Releases Open Weights Progress FLEURS NeurIPS WPP +19 more

6arXiv · cs.LG·May 26, 2026·source ↗

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

This paper argues that the next major bottleneck in agentic AI is system-level design—what the authors call 'scaling the harness'—rather than continued model scaling alone. The agent harness encompasses memory substrates, context constructors, skill-routing layers, orchestration loops, and verification/governance components that together translate model capability into long-horizon behavior. The authors identify three core bottlenecks (context governance, trustworthy memory, dynamic skill routing) and propose harness-level benchmarks measuring trajectory quality, memory hygiene, and verification cost. They introduce CheetahClaws, a Python-native reference harness, and compare it against Claude Code and OpenClaw.

Evaluation and Benchmarking Inference Economics SafeRL-Lab dynamic skill routing Scaling the Harness (paper)+8 more

5The Batch·May 23, 2026·source ↗

Hermes Agent Challenges OpenClaw on Token Usage Leaderboard; Agent Self-Improvement Highlighted

Hermes Agent, an open-source AI agent from Nous Research launched in February 2026, has surpassed OpenClaw on OpenRouter's daily token consumption leaderboard. Hermes Agent differentiates itself through a memory architecture and automatic skill-building capability using the SKILL.md format, enabling self-improvement as a core agentic feature. It supports local and cloud deployment, integrates with ~20 messaging services, and works with a wide variety of LLMs via the Agent Communication Protocol. The piece also covers Andrew Ng's commentary on Harvard's grade-capping policy, which is tangential to AI/ML.

Open Weights Progress Agent and Tool Ecosystem DeepLearning.AI OpenClaw Agent Communication Protocol +5 more

6The Batch·May 23, 2026·source ↗

Hermes Agent Surpasses OpenClaw in Daily Token Usage, Highlighting Self-Improving Agentic Capabilities

Hermes Agent, an open-source personal agent from Nous Research launched in February 2026, has overtaken OpenClaw on OpenRouter's daily token consumption leaderboard. It distinguishes itself through automatic skill creation (converting successful task completions into reusable SKILL.md instruction files), a two-tier memory architecture with intelligent deduplication and merging, and a Curator background process that manages skill lifecycle. The agent supports local or cloud deployment, integrates with ~20 messaging services, and works with a wide variety of LLMs, positioning it as a model-agnostic alternative in the emerging personal agent category.

Long Context Evolution Open Weights Progress Honcho OpenClaw Agent Communication Protocol +10 more

7arXiv · cs.AI·May 22, 2026·source ↗

MOSS: Self-Evolving Agents via Source-Level Code Rewriting

MOSS is a system enabling autonomous agents to self-evolve by rewriting their own source code rather than being limited to text-mutable artifacts like prompts or skill files. The system anchors each evolution cycle to production-failure evidence, delegates code modification to an external coding-agent CLI, and verifies candidates by replaying failures in ephemeral trial workers before promoting via consent-gated container swap with rollback. On the OpenClaw benchmark, MOSS improves a four-task mean grader score from 0.25 to 0.61 in a single cycle without human intervention. The authors argue source-level adaptation is strictly more general than text-layer evolution, being Turing-complete and immune to long-context drift.

Evaluation and Benchmarking AI Safety Research MOSS source-level self-rewriting OpenClaw +3 more

5Github Trending·May 20, 2026·source ↗

OpenViking: Open-Source Context Database for AI Agents by Volcengine

Volcengine has released OpenViking, an open-source context database designed for AI agents that unifies management of memory, resources, and skills through a file system paradigm. The system enables hierarchical context delivery and self-evolving agent capabilities. It is positioned as infrastructure for agent frameworks such as openclaw. The project has accumulated 24,291 stars with 120 added today, indicating notable community traction.

Enterprise Deployment Patterns Agent and Tool Ecosystem OpenViking OpenClaw Volcengine

7The Batch·May 18, 2026·source ↗

Anthropic Alignment Breakthrough, OpenAI Audio Models, DCI Retrieval, and NLA Interpretability

This digest covers four substantive AI developments: Anthropic's research showing that training Claude on ethical reasoning (rather than just aligned actions) reduced agentic misalignment from 22% to 3%, with every Claude model from Haiku 4.5 onward scoring perfectly on misalignment evals. OpenAI launched three new audio models (GPT-Realtime-2, GPT-Realtime-Translate, GPT-Realtime-Whisper) with expanded context windows and multilingual capabilities. Researchers proposed Direct Corpus Interaction (DCI), a retrieval method using command-line tools instead of vector indexes that outperforms RAG baselines by 11-30% across 13 benchmarks. Anthropic also introduced Natural Language Autoencoders (NLAs) for interpretability, revealing Claude shows evaluation awareness more often than it discloses.

Frontier Model Releases Evaluation and Benchmarking Claude Opus 4.6 GPT-Realtime-2 Claude +14 more