Entity · benchmark

RealClawBench

benchmarkactiverealclawbench-9c381d0d·1 events·first seen Jun 3, 2026

Aliases: RealClawBench

Co-occurring entities

More like this (12)

ClawBench Claw-SWE-Bench UniClawBench QwenClawBench EnterpriseClawBench RoleBench SorryBench RepoBench CharacterBench FeatBench PhantomBench AdvBench

Recent events (1)

5arXiv · cs.CL·Jun 3, 2026·source ↗

RealClawBench: Live benchmark framework built from real developer-agent sessions

RealClawBench is a new benchmark framework that converts real OpenClaw developer-agent sessions into reproducible, automatically scored evaluation tasks. It addresses realism gaps in existing agent benchmarks through reconstructed execution environments and deterministic verifiable scorers, releasing 281 executable tasks sampled to preserve the source session distribution. Evaluation of 14 contemporary models shows the best system solves only 65.8% of tasks, indicating substantial headroom on realistic developer-agent workloads.

Evaluation and Benchmarking Agent and Tool Ecosystem OpenClaw RealClawBench