Entity · technique

Verifier-in-the-Loop Training (ViL)

techniqueactiveverifier-in-the-loop-training-vil--9951112b·1 events·first seen May 29, 2026

Aliases: Verifier-in-the-Loop Training (ViL)

Co-occurring entities

self-training Self-Trained Verification (STV)verification-refinement loop reinforcement learning from verifier feedback

More like this (12)

Self-Trained Verification (STV)RELAI Verifiable Continual Learning reinforcement learning from verifier feedback Video PreTraining (VPT)When Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New Tasks test-time training Right in the Right Way: LM Training with Verifiable Rewards and Human Demonstrations LLM-as-a-Verifier self-training VerifierBench selective-verification layer execution-based verification

Recent events (1)

7arXiv · cs.CL·May 29, 2026·source ↗

Self-Trained Verification (STV) Unlocks Training- and Test-Time Self-Improvement for Reasoning Models

This paper introduces Self-Trained Verification (STV), a method that trains a verifier to imitate a more informed version of itself by leveraging reference solutions as supervision signal, addressing the core bottleneck in both test-time verification-refinement loops and self-training pipelines. At test time, STV roughly doubles accuracy on hard math and achieves a 14x lift on scientific reasoning tasks. At training time, the authors combine STV with RL in a procedure called Verifier-in-the-Loop (ViL) training, yielding a 33% further gain in pass@1 over an already RL-converged generator, with standalone pass@1 climbing 30% relative past standard RL convergence. The work argues that verification quality, not generation, is the primary bottleneck for scaling reasoning on hard problems.

Frontier Model Releases Evaluation and Benchmarking self-training Verifier-in-the-Loop Training (ViL)Self-Trained Verification (STV)+4 more