benchmark

DafnyBench

benchmarkactiveprovisionaldafnybench-0e040d11·1 events·first seen 2d ago

Aliases: DafnyBench

Co-occurring entities

More like this (12)

LiveCodeBench-Pro-Dafny SorryBench PhantomBench DeliveryBench FoldBench AdversaBench NatureBench FuzzyBench FuzzyBench RoleBench AdvBench SupraBench

Recent events (1)

5arXiv · cs.AI·2d ago·source ↗

AxDafny: Agentic verified code generation framework achieves 92.7% on DafnyBench

Researchers introduce AxDafny, a verifier-guided agentic repair framework for generating formally verified Dafny code, including implementations, invariants, assertions, and termination arguments. The system achieves 92.7% verification success on DafnyBench, outperforming the strongest prior proof-hint baseline by 6.5 percentage points. The authors also release LCB-Pro-Dafny, a new benchmark of 250 competition-style problems translated into Dafny with formal specifications. The paper additionally finds that verification success and runtime test performance capture distinct dimensions of code quality.

Evaluation and Benchmarking Agent and Tool Ecosystem AxDafny LiveCodeBench-Pro-Dafny DafnyBench +1 more