benchmark
EVA-Bench Data 2.0
benchmarkactiveprovisional
eva-bench-data-2-0-017e0be0·1 events·first seen 12d agoAliases: EVA-Bench Data 2.0
Co-occurring entities
More like this (12)
Recent events (1)
EVA-Bench Data 2.0: Expanded agentic tool-use evaluation benchmark with 121 tools and 213 scenarios
ServiceNow AI has released EVA-Bench Data 2.0, an evaluation benchmark covering 3 domains, 121 tools, and 213 scenarios for assessing agentic AI systems. The benchmark appears designed to measure tool-use and multi-step task completion capabilities across diverse enterprise-relevant contexts. This expands the evaluation surface for agent benchmarking, which remains an active area of development.