Entity · benchmark

BigCodeBench

benchmarkactivebigcodebench-1250f810·2 events·first seen May 19, 2026

Aliases: BigCodeBench

Co-occurring entities

HumanEval Qwen3 Embedding Fast Adaptive Semantic Entropy Hugging Face

More like this (12)

LiveCodeBench BigCode PowerCodeBench Big Bench BigCodeArena ChipBench SpecBench Big Bench Audio SorryBench ProgramBench HealthBench SlopCodeBench

Recent events (2)

5arXiv · cs.AI·Jun 9, 2026·source ↗

FASE: Fast Adaptive Semantic Entropy for uncertainty quantification in multi-agent code generation

Researchers introduce Fast Adaptive Semantic Entropy (FASE), a metric for approximating functional correctness in LLM-generated code using minimum spanning trees of structural and semantic dissimilarity graphs, replacing costly LLM-driven equivalence checks. Evaluated on HumanEval and BigCodeBench with Qwen3-Embedding-8B, FASE achieves a 25% improvement in Spearman correlation and 19% increase in ROCAUC over prior semantic entropy methods. Critically, it requires only ~0.3% of the runtime cost of traditional semantic entropy approaches, making it practical for real-world multi-agent workflows.

Evaluation and Benchmarking Agent and Tool Ecosystem Qwen3 Embedding Fast Adaptive Semantic Entropy BigCodeBench +1 more

5Hugging Face Blog·May 19, 2026·source ↗

BigCodeBench: The Next Generation of HumanEval

Hugging Face introduces BigCodeBench, a new code generation benchmark designed to succeed HumanEval by offering more challenging and diverse programming tasks. The benchmark aims to better evaluate LLMs on real-world coding scenarios involving complex function calls and library usage. A leaderboard accompanies the release to track model performance across the community.

Evaluation and Benchmarking Agent and Tool Ecosystem BigCodeBench Hugging Face HumanEval