benchmark
LiveCodeBench-Pro-Dafny
benchmarkactiveprovisional
livecodebench-pro-dafny-5a2796c1·1 events·first seen 2d agoAliases: LiveCodeBench-Pro-Dafny
Co-occurring entities
More like this (12)
Recent events (1)
AxDafny: Agentic verified code generation framework achieves 92.7% on DafnyBench
Researchers introduce AxDafny, a verifier-guided agentic repair framework for generating formally verified Dafny code, including implementations, invariants, assertions, and termination arguments. The system achieves 92.7% verification success on DafnyBench, outperforming the strongest prior proof-hint baseline by 6.5 percentage points. The authors also release LCB-Pro-Dafny, a new benchmark of 250 competition-style problems translated into Dafny with formal specifications. The paper additionally finds that verification success and runtime test performance capture distinct dimensions of code quality.