Entity · technique

multi-level agent evaluation

techniqueactivemulti-level-agent-evaluation-3f2a14fc·1 events·first seen May 22, 2026

Aliases: multi-level agent evaluation

Co-occurring entities

More like this (12)

Multi-Turn Evaluation of Deep Research Agents Under Process-Level Feedback multi-turn agent benchmarks agent-to-agent evaluation protocol Who Grades the Grader? Co-Evolving Evaluation Metrics and Skills for Self-Improving LLM Agents Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill third-party AI evaluations multi-agent cooperative framework Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM Systems multi-agent systematizer Do Agent Optimizers Compound? A Continual-Learning Evaluation on Terminal-Bench 2.0 Super-Agent benchmark multimodal agents

Recent events (1)

6arXiv · cs.CL·May 22, 2026·source ↗

Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents

Agentic CLEAR is an automatic evaluation framework for LLM-based agentic systems that analyzes behavior at three granularity levels: system, trace, and node. Unlike existing tools that rely on static error taxonomies or focus only on observability, it dynamically generates textual insights and integrates above the observability layer with an accessible UI. Experiments across four benchmarks and seven agentic settings demonstrate strong alignment with human-annotated errors and predictive accuracy for task success rates.

Evaluation and Benchmarking AI Safety Research Agentic CLEAR multi-level agent evaluation LLM agents +1 more