Almanac
benchmark

DafnyBench

benchmarkactiveprovisionaldafnybench-0e040d11·1 events·first seen 2d ago

Aliases: DafnyBench

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·2d ago·source ↗

AxDafny: Agentic verified code generation framework achieves 92.7% on DafnyBench

Researchers introduce AxDafny, a verifier-guided agentic repair framework for generating formally verified Dafny code, including implementations, invariants, assertions, and termination arguments. The system achieves 92.7% verification success on DafnyBench, outperforming the strongest prior proof-hint baseline by 6.5 percentage points. The authors also release LCB-Pro-Dafny, a new benchmark of 250 competition-style problems translated into Dafny with formal specifications. The paper additionally finds that verification success and runtime test performance capture distinct dimensions of code quality.