benchmark

ProofNet-Test

benchmarkactiveprovisionalproofnet-test-442e4c55·1 events·first seen 3d ago

Aliases: ProofNet-Test

Co-occurring entities

dLLM-Prover-7B Diffusion-Proof DeepSeek-Prover-V2-7B dLLM-Corrector-7B MiniF2F-Test

More like this (12)

PRNet HypeNet BitNet Neural Theorem Prover PowerCodeBench CodeSearchNet BitNet b1.58 BrushNet TokenBench MiniF2F-Test BOPTEST ResNet-50

Recent events (1)

6arXiv · cs.LG·3d ago·source ↗

Diffusion-Proof: First framework applying diffusion LLMs to formal theorem proving

Researchers introduce Diffusion-Proof, the first framework to train and apply diffusion language models (dLLMs) for formal theorem proving, addressing limitations of autoregressive models in long-range coherence. The framework includes dLLM-Prover-7B for whole-proof generation and dLLM-Corrector-7B for local proof correction via bidirectional infilling. Diffusion-Proof achieves absolute improvements of 1.61% on ProofNet-Test and 6.14% on MiniF2F-Test over an AR baseline, and solves one IMO problem that DeepSeek-Prover-V2-7B could not. The result suggests dLLMs may have structural advantages over AR models for tasks requiring long-range logical coherence.

Frontier Model Releases Evaluation and Benchmarking dLLM-Prover-7B Diffusion-Proof DeepSeek-Prover-V2-7B +3 more