benchmark
NPHardEval
benchmarkactive
nphardeval-256b2b28·1 events·first seen 28d agoAliases: NPHardEval
Co-occurring entities
More like this (12)
Recent events (1)
NPHardEval Leaderboard: Benchmarking LLM Reasoning via Computational Complexity Classes
The NPHardEval leaderboard evaluates large language models on reasoning tasks drawn from computational complexity classes (P, NP, NP-Hard), providing a structured framework for assessing algorithmic reasoning capabilities. The benchmark uses dynamic problem updates to mitigate data contamination, a persistent challenge in static benchmarks. Results are hosted on Hugging Face and aim to reveal systematic differences in how frontier models handle problems of varying computational hardness.