Selection Without Signal, Recovery Through Expression: A Measurement Study of Post-Hoc Falsification Operators for Frozen Small Code Models
selection-without-signal-recovery-through-expression-a-measurement-study-of-post-hoc-falsification-operators-for-frozen-small-code-models-4eea9bcd·1 events·first seen 31h agoAliases: Selection Without Signal, Recovery Through Expression: A Measurement Study of Post-Hoc Falsification Operators for Frozen Small Code Models
Co-occurring entities
More like this (12)
Recent events (1)
Post-hoc falsification operators for frozen small code models fail to beat Best-of-N in leakage-free evaluation
A measurement study evaluates 26 post-hoc operators (selection, verification, repair, elimination, portfolios) applied to frozen small code models (≤1.5B parameters) against a Best-of-N baseline under a strict leakage-free, matched-compute protocol. None of the semantic operators improves held-out accuracy over BoN, with the failure traced to three structural mechanisms: a coverage wall, a capability scissors, and a near-empty consensus trap. Two non-semantic operators do provide value: an expression-layer recovery method (M1) lifts DeepSeek-Coder-1.3B by +12 tasks on HumanEval+ (p=2.4e-4), and an adaptive consensus early-stop saves ~19% compute with no accuracy harm. The paper's core lesson is that harness quality and coverage measurement should precede investment in semantic post-hoc reasoning.