benchmark
Erdős Problems
benchmarkactive
erd-s-problems-c6ebf90a·1 events·first seen 26d agoAliases: Erdős Problems
Co-occurring entities
More like this (12)
Recent events (1)
Large-Scale Evaluation of LLM-Driven Formal Proof Search on Open Mathematical Problems
Researchers present the first large-scale evaluation of LLM-based formal proof search on genuinely open mathematical problems, using Lean as a verification backend. Their most capable agent autonomously resolved 9 of 353 open Erdős problems and proved 44 of 492 OEIS conjectures, at a cost of a few hundred dollars per problem. The system is already being deployed in active research across combinatorics, optimization, graph theory, algebraic geometry, and quantum optics. The study also compares agent architectures, finding that more sophisticated designs outperform simple generate-and-verify loops on the hardest problems.