benchmark
IT-Bench
benchmarkactive
it-bench-21ca184a·1 events·first seen 1mo agoAliases: IT-Bench
Co-occurring entities
More like this (12)
Recent events (1)
IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST
IBM Research and UC Berkeley have released IT-Bench and MAST, a benchmark suite and diagnostic framework aimed at evaluating why AI agents fail in enterprise IT environments. The work targets realistic IT operations tasks such as incident response, service management, and infrastructure automation. By categorizing failure modes systematically, MAST provides a structured taxonomy for understanding agent shortcomings beyond simple pass/fail metrics. This addresses a gap in enterprise-focused agent evaluation, where general benchmarks often fail to capture domain-specific complexity.