Almanac
benchmark

OEIS Conjectures

benchmarkactiveoeis-conjectures-615be3bf·1 events·first seen 26d ago

Aliases: OEIS Conjectures

Co-occurring entities

More like this (12)

Recent events (1)

8arXiv · cs.AI·26d ago·source ↗

Large-Scale Evaluation of LLM-Driven Formal Proof Search on Open Mathematical Problems

Researchers present the first large-scale evaluation of LLM-based formal proof search on genuinely open mathematical problems, using Lean as a verification backend. Their most capable agent autonomously resolved 9 of 353 open Erdős problems and proved 44 of 492 OEIS conjectures, at a cost of a few hundred dollars per problem. The system is already being deployed in active research across combinatorics, optimization, graph theory, algebraic geometry, and quantum optics. The study also compares agent architectures, finding that more sophisticated designs outperform simple generate-and-verify loops on the hardest problems.