Almanac
benchmark

EVA-Bench Data 2.0

benchmarkactiveprovisionaleva-bench-data-2-0-017e0be0·1 events·first seen 12d ago

Aliases: EVA-Bench Data 2.0

Co-occurring entities

More like this (12)

Recent events (1)

5Hugging Face Blog·12d ago·source ↗

EVA-Bench Data 2.0: Expanded agentic tool-use evaluation benchmark with 121 tools and 213 scenarios

ServiceNow AI has released EVA-Bench Data 2.0, an evaluation benchmark covering 3 domains, 121 tools, and 213 scenarios for assessing agentic AI systems. The benchmark appears designed to measure tool-use and multi-step task completion capabilities across diverse enterprise-relevant contexts. This expands the evaluation surface for agent benchmarking, which remains an active area of development.