Almanac
technique

Verifier-in-the-Loop Training (ViL)

techniqueactiveprovisionalverifier-in-the-loop-training-vil--9951112b·1 events·first seen 18d ago

Aliases: Verifier-in-the-Loop Training (ViL)

Co-occurring entities

More like this (12)

Recent events (1)

7arXiv · cs.CL·18d ago·source ↗

Self-Trained Verification (STV) Unlocks Training- and Test-Time Self-Improvement for Reasoning Models

This paper introduces Self-Trained Verification (STV), a method that trains a verifier to imitate a more informed version of itself by leveraging reference solutions as supervision signal, addressing the core bottleneck in both test-time verification-refinement loops and self-training pipelines. At test time, STV roughly doubles accuracy on hard math and achieves a 14x lift on scientific reasoning tasks. At training time, the authors combine STV with RL in a procedure called Verifier-in-the-Loop (ViL) training, yielding a 33% further gain in pass@1 over an already RL-converged generator, with standalone pass@1 climbing 30% relative past standard RL convergence. The work argues that verification quality, not generation, is the primary bottleneck for scaling reasoning on hard problems.