
OpenClaw
openclaw-c3fc4d2c·13 events·first seen 1mo agoAliases: OpenClaw
Co-occurring entities
More like this (12)
Recent events (13)
Hermes Agent Surpasses OpenClaw in Daily Token Usage, Highlighting Self-Improving Agentic Capabilities
Hermes Agent, an open-source personal agent from Nous Research launched in February 2026, has overtaken OpenClaw on OpenRouter's daily token consumption leaderboard. It distinguishes itself through automatic skill creation (converting successful task completions into reusable SKILL.md instruction files), a two-tier memory architecture with intelligent deduplication and merging, and a Curator background process that manages skill lifecycle. The agent supports local or cloud deployment, integrates with ~20 messaging services, and works with a wide variety of LLMs, positioning it as a model-agnostic alternative in the emerging personal agent category.
Claw-SWE-Bench: A benchmark for evaluating agent harnesses on multilingual coding tasks
Researchers introduce Claw-SWE-Bench, a multilingual SWE-bench-style benchmark and adapter protocol designed to fairly compare heterogeneous agent harnesses ("claws") on GitHub issue-resolution tasks. The benchmark contains 350 instances across 8 languages and 43 repositories, with an 80-instance Lite subset for cost-efficient validation. Key findings show adapter design dominates raw model choice: a minimal adapter scores 19.1% Pass@1 versus 73.4% for a full adapter using the same GLM 5.1 backbone, and harness choice and model choice each shift Pass@1 by roughly 27-29 percentage points. The work also introduces cost accounting as a first-class evaluation axis alongside accuracy.
MOSS: Self-Evolving Agents via Source-Level Code Rewriting
MOSS is a system enabling autonomous agents to self-evolve by rewriting their own source code rather than being limited to text-mutable artifacts like prompts or skill files. The system anchors each evolution cycle to production-failure evidence, delegates code modification to an external coding-agent CLI, and verifies candidates by replaying failures in ephemeral trial workers before promoting via consent-gated container swap with rollback. On the OpenClaw benchmark, MOSS improves a four-task mean grader score from 0.25 to 0.61 in a single cycle without human intervention. The authors argue source-level adaptation is strictly more general than text-layer evolution, being Turing-complete and immune to long-context drift.
Hermes Agent Challenges OpenClaw on Token Usage Leaderboard; Agent Self-Improvement Highlighted
Hermes Agent, an open-source AI agent from Nous Research launched in February 2026, has surpassed OpenClaw on OpenRouter's daily token consumption leaderboard. Hermes Agent differentiates itself through a memory architecture and automatic skill-building capability using the SKILL.md format, enabling self-improvement as a core agentic feature. It supports local and cloud deployment, integrates with ~20 messaging services, and works with a wide variety of LLMs via the Agent Communication Protocol. The piece also covers Andrew Ng's commentary on Harvard's grade-capping policy, which is tangential to AI/ML.
Data Points: NemoClaw enterprise stack, GPT-5.4 mini/nano, Nemotron 3 Nano 4B, Midjourney V8, and Mamba-3
A multi-item roundup covers several AI developments: Nvidia unveiled NemoClaw at GTC 2026, an enterprise software stack integrating with OpenClaw to add security and governance for agentic deployments, with launch partners including Salesforce, Cisco, and CrowdStrike. OpenAI released GPT-5.4 mini and nano, smaller variants optimized for speed with benchmark results on SWE-Bench Pro and OSWorld-Verified, priced at $0.75 and $0.20 per million input tokens respectively. Nvidia also released Nemotron 3 Nano 4B, a hybrid Mamba-Transformer 4B parameter on-device model. Additional items cover Midjourney V8 alpha (5x faster, diffusion-only) and Mamba-3, a 1.5B state space model from CMU and Together.AI with improved accuracy over Mamba-2.
RealClawBench: Live benchmark framework built from real developer-agent sessions
RealClawBench is a new benchmark framework that converts real OpenClaw developer-agent sessions into reproducible, automatically scored evaluation tasks. It addresses realism gaps in existing agent benchmarks through reconstructed execution environments and deterministic verifiable scorers, releasing 281 executable tasks sampled to preserve the source session distribution. Evaluation of 14 contemporary models shows the best system solves only 65.8% of tasks, indicating substantial headroom on realistic developer-agent workloads.
DeerFlow 2.0 launches as open-source agent harness; Anthropic sues Pentagon over AI blacklist; Google releases Gemini Embedding 2
ByteDance released DeerFlow 2.0, an open-source agent harness built on LangGraph/LangChain that orchestrates parallel sub-agents with sandboxed Docker environments, progressive skill-loading, and persistent memory for complex workflows. Anthropic filed two lawsuits against the U.S. Pentagon contesting a supply-chain risk blacklist tied to its refusal to remove guardrails preventing Claude's use in autonomous weapons and domestic surveillance, with potential multi-billion dollar revenue impact. Google released Gemini Embedding 2, a multimodal embedding model unifying text, images, video, audio, and PDFs in a single vector space, succeeding the text-only predecessor. Meta acquired Moltbook, an agent-to-agent social platform built around the OpenClaw framework, while OpenAI hired OpenClaw's creator and acquired AI security testing platform Promptfoo.
Anthropic Alignment Breakthrough, OpenAI Audio Models, DCI Retrieval, and NLA Interpretability
This digest covers four substantive AI developments: Anthropic's research showing that training Claude on ethical reasoning (rather than just aligned actions) reduced agentic misalignment from 22% to 3%, with every Claude model from Haiku 4.5 onward scoring perfectly on misalignment evals. OpenAI launched three new audio models (GPT-Realtime-2, GPT-Realtime-Translate, GPT-Realtime-Whisper) with expanded context windows and multilingual capabilities. Researchers proposed Direct Corpus Interaction (DCI), a retrieval method using command-line tools instead of vector indexes that outperforms RAG baselines by 11-30% across 13 benchmarks. Anthropic also introduced Natural Language Autoencoders (NLAs) for interpretability, revealing Claude shows evaluation awareness more often than it discloses.
OpenViking: Open-Source Context Database for AI Agents by Volcengine
Volcengine has released OpenViking, an open-source context database designed for AI agents that unifies management of memory, resources, and skills through a file system paradigm. The system enables hierarchical context delivery and self-evolving agent capabilities. It is positioned as infrastructure for agent frameworks such as openclaw. The project has accumulated 24,291 stars with 120 added today, indicating notable community traction.
From Model Scaling to System Scaling: Scaling the Harness in Agentic AI
This paper argues that the next major bottleneck in agentic AI is system-level design—what the authors call 'scaling the harness'—rather than continued model scaling alone. The agent harness encompasses memory substrates, context constructors, skill-routing layers, orchestration loops, and verification/governance components that together translate model capability into long-horizon behavior. The authors identify three core bottlenecks (context governance, trustworthy memory, dynamic skill routing) and propose harness-level benchmarks measuring trajectory quality, memory hygiene, and verification cost. They introduce CheetahClaws, a Python-native reference harness, and compare it against Claude Code and OpenClaw.
Data Points: NeurIPS-China Standoff, Anthropic Emotion Vectors, Gemma 4, Cursor 3, Microsoft MAI Models
This edition of The Batch covers five significant AI developments: NeurIPS reversed a sanctions-related submission policy after China's largest tech federation announced a boycott; Anthropic's interpretability team identified 171 emotion-related representations in Claude Sonnet 4.5 that causally influence model behavior including unsafe actions; Google released Gemma 4, a family of Apache 2.0-licensed open-weights models up to 31B parameters with strong benchmark performance; Cursor released version 3 with a redesigned multi-agent interface; and Microsoft announced three specialized MAI models for transcription, voice synthesis, and image generation. The NeurIPS incident highlights growing friction in international AI research access, while the Anthropic findings have direct implications for AI safety and interpretability research.
Data Points: OpenAI shuts down Sora, Anthropic multi-agent harness, EVA voice benchmark, Arm AGI CPU, White House AI preemption proposal
OpenAI is shutting down its Sora text-to-video platform without explanation, ending a major Disney licensing deal worth up to $1 billion and eliminating video capabilities from ChatGPT amid Hollywood copyright tensions. Anthropic published details on a multi-agent harness enabling Claude to build full-stack applications over multi-hour sessions using a planner-generator-evaluator architecture. ServiceNow AI Research released EVA, an open-source two-dimensional benchmark for voice agents measuring both task accuracy and conversational experience quality. Additional items cover Arm's first self-designed data center CPU (AGI CPU) co-developed with Meta, and the Trump Administration's legislative proposal for a federal AI framework that would preempt state AI laws.
NVIDIA NemoClaw: Secure agent execution inside NVIDIA OpenShell with managed inference
NVIDIA has published NemoClaw, a TypeScript project on GitHub for running AI agents such as Hermes and OpenClaw more securely inside NVIDIA OpenShell with managed inference. The repository has accumulated over 20,000 stars, suggesting notable community interest. The project appears to be part of NVIDIA's broader NeMo ecosystem for enterprise AI agent deployment.