Entity · benchmark

IT-Bench

benchmarkactiveit-bench-21ca184a·1 events·first seen May 18, 2026

Aliases: IT-Bench

Co-occurring entities

More like this (12)

Terminal-Bench ESI-Bench MT-Bench Int-Bench ATE-Bench TriggerBench ITBench-AA FinBench τ²-Bench PaperBench EdgeBench HealthBench

Recent events (1)

5Hugging Face Blog·May 18, 2026·source ↗

IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

IBM Research and UC Berkeley have released IT-Bench and MAST, a benchmark suite and diagnostic framework aimed at evaluating why AI agents fail in enterprise IT environments. The work targets realistic IT operations tasks such as incident response, service management, and infrastructure automation. By categorizing failure modes systematically, MAST provides a structured taxonomy for understanding agent shortcomings beyond simple pass/fail metrics. This addresses a gap in enterprise-focused agent evaluation, where general benchmarks often fail to capture domain-specific complexity.

IBM Research UC Berkeley IT-Bench