Entity · benchmark

VAKRA

benchmarkactivevakra-3a9922fc·1 events·first seen May 18, 2026

Aliases: VAKRA

Co-occurring entities

IBM Research Hugging Face

More like this (12)

veRL Vectara VideoVAE+O-VAD VisA Vexa VQ-VAE SmolVLA RAGAS RWKV DocVQA CXR-VQA

Recent events (1)

5Hugging Face Blog·May 18, 2026·source ↗

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

IBM Research presents an analysis of VAKRA, a benchmark designed to evaluate agentic AI systems on reasoning and tool use capabilities. The post examines how agents fail across different task categories, surfacing systematic failure modes in multi-step reasoning and tool invocation. The analysis provides diagnostic insights into where current agent architectures break down under realistic task conditions.

Evaluation and Benchmarking AI Safety Research IBM Research Hugging Face VAKRA +1 more