benchmark
ITBench-AA
benchmarkactiveprovisional
itbench-aa-5a81ecca·1 events·first seen 20d agoAliases: ITBench-AA
Co-occurring entities
More like this (12)
Recent events (1)
ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks
IBM Research and Artificial Analysis have released ITBench-AA, a benchmark targeting agentic AI performance on enterprise IT operations tasks. Frontier models evaluated on the benchmark score below 50%, indicating significant capability gaps in real-world IT automation scenarios. The benchmark appears to be the first of its kind focused specifically on agentic enterprise IT workflows, covering tasks relevant to site reliability engineering and IT operations.