The Signal-Coverage Matrix: Stratifying Type and Semantic Errors in Statement Autoformalization
the-signal-coverage-matrix-stratifying-type-and-semantic-errors-in-statement-autoformalization-c557a980·1 events·first seen 15h agoAliases: The Signal-Coverage Matrix: Stratifying Type and Semantic Errors in Statement Autoformalization
Co-occurring entities
More like this (12)
Recent events (1)
Signal-Coverage Matrix proposes finer-grained evaluation of LLM autoformalization errors
A new arXiv preprint introduces the signal-coverage matrix, a 2×2 framework that crosses Lean elaborator pass/fail with semantic-equivalence judgments to decompose autoformalization errors into four distinct cells rather than a single type-correctness scalar. The authors evaluate four methods (Vanilla, Lean-Retry, Sample-Filter, and Stratified Autoformalization) on ProofNet# and MiniF2F-test using DeepSeek V4-Pro, finding that headline TC% gains mask flat semantic-only error recovery and that symbolic and LLM judges diverge by 26–37 percentage points on elaborator-feedback outputs. The work argues that TC% improvements should be credited by which error cell moved, not by the aggregate scalar alone.