Almanac
company

Atla

companyactiveatla-a5c0ea1a·1 events·first seen 28d ago

Aliases: Atla

Co-occurring entities

More like this (12)

Recent events (1)

5Hugging Face Blog·28d ago·source ↗

Judge Arena: Benchmarking LLMs as Evaluators

Hugging Face and Atla have launched Judge Arena, a platform for benchmarking large language models in their role as automated evaluators. The initiative uses an Elo-based ranking system to compare how well different LLMs judge the quality of model outputs, addressing the growing reliance on LLM-as-judge paradigms in evaluation pipelines. This fills a meta-evaluation gap: as LLM judges become standard practice, understanding their relative reliability and biases becomes critical infrastructure for the field.