paper
Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier
paperactiveprovisional
scaling-llm-reasoning-from-minimal-labels-a-semi-supervised-framework-with-a-lightweight-verifier-f575f627·1 events·first seen 25h agoAliases: Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier
Co-occurring entities
More like this (12)
Operadic consistency: a label-free signal for compositional reasoning failures in LLMsSoft Label SupervisionWhen Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New TasksAgentic Chain-of-Thought Steering for Efficient and Controllable LLM ReasoningFrom Correctness to Utility: Gain-Based Prefix Evaluation for LLM ReasoningReasoning Language ModelsBackdoor Unlearning Generalization: A Path Toward the Removal of Unknown Triggers in LLMsLearning from the Self-future: On-policy Self-distillation for dLLMsWatch, Remember, Reason: Human-View Video Understanding with MLLMsWhich Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMsJanus: A Benchmark for Goal-Conditioned Information Distortion in LLMsContinual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs
Recent events (1)
Semi-supervised framework scales LLM reasoning with minimal labeled data via lightweight verifier
A new arXiv preprint proposes a semi-supervised framework for training LLMs to reason with very few labeled examples, using a lightweight classifier to judge the validity of intermediate reasoning traces. An entropy-based confidence threshold filters unreliable pseudo-labels before fine-tuning. Experiments on math reasoning (Orca-Math subset) and visual QA (GQA) show accuracy comparable to using 10-15x more labeled data. The approach reduces dependence on expensive answer-level supervision by turning verification into a data-creation mechanism.