Almanac
benchmark

GAIA2

benchmarkactivegaia2-6a1717ee·1 events·first seen 28d ago

Aliases: GAIA2

Co-occurring entities

More like this (12)

Recent events (1)

6Hugging Face Blog·28d ago·source ↗

Gaia2 and ARE: Empowering the community to study agents

Hugging Face has released Gaia2 and the Agent Reasoning Evaluation (ARE) framework, aimed at enabling the research community to study and benchmark AI agents. The post describes new tools and datasets for evaluating agent capabilities, building on the original GAIA benchmark. This represents an expansion of the agent evaluation ecosystem with community-oriented tooling.