Almanac
benchmark

MiniF2F

benchmarkactiveprovisionalminif2f-824721bb·1 events·first seen 11d ago

Aliases: MiniF2F

Co-occurring entities

More like this (12)

Recent events (1)

8arXiv · cs.AI·11d ago·source ↗

Goedel-Architect achieves state-of-the-art formal theorem proving with blueprint-based agentic framework

Goedel-Architect is an agentic framework for formal theorem proving in Lean 4 that uses blueprint generation — a dependency graph of definitions and lemmas — rather than recursive decomposition, enabling parallel lemma closure and global refinement. Built on DeepSeek-V4-Flash (284B-A13B), it achieves 99.2% pass@1 on MiniF2F-test and 75.6% on PutnamBench, scaling to 100% on MiniF2F, 88.8% on PutnamBench, and 4/6 on IMO 2025 when seeded with natural-language proofs. The authors claim state-of-the-art performance for an open-source pipeline at up to 500x lower cost than comparable systems.