Entity · benchmark

BigCodeArena

benchmarkactivebigcodearena-7fccc908·1 events·first seen May 19, 2026

Aliases: BigCodeArena

Co-occurring entities

More like this (12)

BigCode BigCodeBench Game Arena Chatbot Arena Arena Code Arena.ai Code Arena WebDev BashArena TTS Arena Arena Search CodeGemma WebArena Big Bench Audio

Recent events (1)

5Hugging Face Blog·May 19, 2026·source ↗

BigCodeArena: Judging code generations end to end with code executions

BigCodeArena is a new evaluation framework for code generation models that uses end-to-end code execution to judge outputs rather than relying on static metrics or human preference ratings alone. The approach aims to provide more reliable and objective assessments of coding model capabilities by running generated code and evaluating actual execution results. This addresses known limitations of LLM-as-judge and human annotation methods for code evaluation benchmarks.

Evaluation and Benchmarking Agent and Tool Ecosystem BigCode BigCodeArena Hugging Face