Almanac
← Events
4Hugging Face Blog·1mo ago

Object Detection Leaderboard on Hugging Face

Hugging Face has launched an object detection leaderboard to benchmark and compare models on standard detection tasks. The leaderboard provides a centralized evaluation platform for tracking progress in object detection across the community. This follows the pattern of Hugging Face expanding its evaluation infrastructure for specific ML subdomains.

Related guides (2)

Related events (8)

3Hugging Face Blog·1mo ago·source ↗

Guide to Setting Up a Hugging Face Leaderboard: Vectara Hallucination Leaderboard as Example

This Hugging Face blog post provides an end-to-end tutorial on creating custom leaderboards on the Hugging Face platform, using Vectara's hallucination leaderboard as a concrete example. It covers the technical setup process for hosting evaluation leaderboards, which are increasingly important infrastructure for tracking model capabilities. The post bridges tooling and evaluation concerns by showing how third-party organizations can publish standardized benchmarks on HF.

4Hugging Face Blog·1mo ago·source ↗

Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face

Hugging Face is hosting the Artificial Analysis LLM Performance Leaderboard, which tracks inference performance metrics such as latency, throughput, and cost across multiple LLM providers. The leaderboard provides a standardized comparison of how different models perform in production deployment contexts rather than purely capability benchmarks. This collaboration brings infrastructure and deployment performance data into the Hugging Face ecosystem.

4Hugging Face Blog·1mo ago·source ↗

Introducing the Open Leaderboard for Hebrew LLMs

Hugging Face has launched an open leaderboard dedicated to evaluating large language models on Hebrew language tasks. The leaderboard aims to benchmark multilingual and Hebrew-specific models across standardized tasks to track progress in Hebrew NLP. This fills a gap in non-English language evaluation infrastructure.

5Hugging Face Blog·1mo ago·source ↗

Launching the Artificial Analysis Text to Image Leaderboard & Arena

Hugging Face and Artificial Analysis are launching a combined leaderboard and arena for evaluating text-to-image models. The leaderboard tracks quality, speed, and cost metrics across leading image generation models, while the arena component collects human preference votes for side-by-side comparisons. This provides a structured benchmark for comparing commercial and open-weight image generation systems.

5Hugging Face Blog·1mo ago·source ↗

The Open Agent Leaderboard

IBM Research and Hugging Face have launched the Open Agent Leaderboard, a public benchmark for evaluating AI agents across standardized tasks. The leaderboard aims to provide transparent, reproducible comparisons of open and proprietary agent systems. This initiative addresses the growing need for rigorous evaluation infrastructure as the agent ecosystem matures.

4Hugging Face Blog·1mo ago·source ↗

The State of Computer Vision at Hugging Face

Hugging Face published a survey of the computer vision ecosystem available through its platform as of early 2023, covering supported model architectures, tasks, datasets, and tooling. The post reviews progress in image classification, object detection, segmentation, and multimodal vision-language models integrated into the Transformers library. It serves as a reference for practitioners on what CV capabilities are accessible via the Hugging Face hub and APIs.

5Hugging Face Blog·1mo ago·source ↗

The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models

Hugging Face has launched an open leaderboard specifically designed to benchmark hallucination rates across large language models. The effort aims to standardize evaluation of factual accuracy and confabulation tendencies, filling a gap in existing benchmarks that focus primarily on capability rather than reliability. The leaderboard is positioned as a community-driven, transparent resource for tracking model trustworthiness.

4Hugging Face Blog·1mo ago·source ↗

Introducing the Open Leaderboard for Japanese LLMs

Hugging Face has launched an open leaderboard specifically for evaluating large language models on Japanese language tasks. The leaderboard aims to provide standardized benchmarking for Japanese LLMs, filling a gap in multilingual evaluation infrastructure. This initiative supports the growing ecosystem of Japanese-language AI development and open evaluation practices.