Almanac
benchmark

ITBench-AA

benchmarkactiveprovisionalitbench-aa-5a81ecca·1 events·first seen 20d ago

Aliases: ITBench-AA

Co-occurring entities

More like this (12)

Recent events (1)

5Hugging Face Blog·20d ago·source ↗

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks

IBM Research and Artificial Analysis have released ITBench-AA, a benchmark targeting agentic AI performance on enterprise IT operations tasks. Frontier models evaluated on the benchmark score below 50%, indicating significant capability gaps in real-world IT automation scenarios. The benchmark appears to be the first of its kind focused specifically on agentic enterprise IT workflows, covering tasks relevant to site reliability engineering and IT operations.