Almanac
benchmark

VAKRA

benchmarkactivevakra-3a9922fc·1 events·first seen 1mo ago

Aliases: VAKRA

Co-occurring entities

More like this (12)

Recent events (1)

5Hugging Face Blog·1mo ago·source ↗

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

IBM Research presents an analysis of VAKRA, a benchmark designed to evaluate agentic AI systems on reasoning and tool use capabilities. The post examines how agents fail across different task categories, surfacing systematic failure modes in multi-step reasoning and tool invocation. The analysis provides diagnostic insights into where current agent architectures break down under realistic task conditions.