Almanac
benchmark

IT-Bench

benchmarkactiveit-bench-21ca184a·1 events·first seen 1mo ago

Aliases: IT-Bench

Co-occurring entities

More like this (12)

Recent events (1)

5Hugging Face Blog·1mo ago·source ↗

IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

IBM Research and UC Berkeley have released IT-Bench and MAST, a benchmark suite and diagnostic framework aimed at evaluating why AI agents fail in enterprise IT environments. The work targets realistic IT operations tasks such as incident response, service management, and infrastructure automation. By categorizing failure modes systematically, MAST provides a structured taxonomy for understanding agent shortcomings beyond simple pass/fail metrics. This addresses a gap in enterprise-focused agent evaluation, where general benchmarks often fail to capture domain-specific complexity.