Almanac
paper

Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier

paperactiveprovisionalscaling-llm-reasoning-from-minimal-labels-a-semi-supervised-framework-with-a-lightweight-verifier-f575f627·1 events·first seen 25h ago

Aliases: Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·25h ago·source ↗

Semi-supervised framework scales LLM reasoning with minimal labeled data via lightweight verifier

A new arXiv preprint proposes a semi-supervised framework for training LLMs to reason with very few labeled examples, using a lightweight classifier to judge the validity of intermediate reasoning traces. An entropy-based confidence threshold filters unreliable pseudo-labels before fine-tuning. Experiments on math reasoning (Orca-Math subset) and visual QA (GQA) show accuracy comparable to using 10-15x more labeled data. The approach reduces dependence on expensive answer-level supervision by turning verification into a data-creation mechanism.