Entity · benchmark

NPHardEval

benchmarkactivenphardeval-256b2b28·1 events·first seen May 19, 2026

Aliases: NPHardEval

Co-occurring entities

More like this (12)

HumanEval ProActEval L-Eval ParaEval HypoEval NP-Hard G-Eval DraftNEPABench ValueEval NeMo Evaluator GIFT-Eval AlpacaEval 2

Recent events (1)

5Hugging Face Blog·May 19, 2026·source ↗

NPHardEval Leaderboard: Benchmarking LLM Reasoning via Computational Complexity Classes

The NPHardEval leaderboard evaluates large language models on reasoning tasks drawn from computational complexity classes (P, NP, NP-Hard), providing a structured framework for assessing algorithmic reasoning capabilities. The benchmark uses dynamic problem updates to mitigate data contamination, a persistent challenge in static benchmarks. Results are hosted on Hugging Face and aim to reveal systematic differences in how frontier models handle problems of varying computational hardness.

Frontier Model Releases Evaluation and Benchmarking Hugging Face NPHardEval NP-Hard