technique
isomorphism-based scoring
techniqueactiveprovisional
isomorphism-based-scoring-6a4de5a0·1 events·first seen 22d agoAliases: isomorphism-based scoring
Co-occurring entities
More like this (12)
SciBERTScoreBERTScoreintegrated Brier scorespoiler-score detectorconcrete score functionLLM-judge scoringsynthetic data evaluationhuman alignment benchmarks (perceptual similarity, gloss, robustness, shape-texture)AI-Assisted Systematization for Evaluating GenAI Systemspaired-scenario forced-choice probeRepresentational Similarity AnalysisMatching Principle
Recent events (1)
Agentic Proving for Program Verification: Claude Code Achieves 98.1% on CLEVER Benchmark
Researchers evaluate Claude Code in an agentic proving framework on CLEVER, a Lean 4 benchmark for verifiable code generation, achieving 98.1% end-to-end success on program generation and verification over self-consistent entries. The system generates valid specifications for 98.8% of problems and certifies implementations against ground-truth specifications for 87.5% of problems. The results reveal a growing mismatch between existing program verification benchmark difficulty and modern agentic prover capabilities, motivating calls for more rigorous evaluation methodologies. The findings support compiler-in-the-loop agentic paradigms as the current state-of-the-art for foundational program verification.