technique
proactive assistance evaluation
techniqueactiveprovisional
proactive-assistance-evaluation-98265d9c·1 events·first seen 22d agoAliases: proactive assistance evaluation
Co-occurring entities
More like this (12)
proactive agent architectureProActEvalproactive AI agentsagent-to-agent evaluation protocolAASISTproactive documentation injectionmulti-level agent evaluationAPPO: Agentic Procedural Policy OptimizationProActCranfield evaluation paradigmAI-assisted human evaluationAgentic System Monitoring Methodology
Recent events (1)
Claw-Anything: Benchmark for Always-On Personal Assistants with Broad Digital World Access
Claw-Anything is a new benchmark designed to evaluate LLM agents acting as always-on personal assistants with access to long-horizon activity histories, interdependent backend services, and multi-device GUI/CLI interaction. The benchmark simulates months of user activity to create complex, noisy world states and evaluates both reactive and proactive assistance. GPT-5.5 achieves only 34.5% pass@1, revealing a substantial capability gap versus prior narrower benchmarks. An accompanying automated data-generation pipeline produces 2,000 training environments and yields a 23.7% improvement over the base model.