Entity · paper

Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier

paperactivescaling-llm-reasoning-from-minimal-labels-a-semi-supervised-framework-with-a-lightweight-verifier-f575f627·1 events·first seen Jun 16, 2026

Aliases: Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier

Co-occurring entities

GQA Orca-Math

More like this (12)

Operadic consistency: a label-free signal for compositional reasoning failures in LLMs Soft Label Supervision When Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New Tasks Explicit Fuzzy Logic in the Feed-Forward Layer: Self-Forgetting Quantifiers Discover Legible Grammatical-Licensing Detectors Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning Can We Break LLMs Out of Self-Loops? Fine-Grained Reasoning Control with Activation Steering Partially Correlated Verifier Cascades in LLM Harnesses: Concave Log-Odds, Polynomial Reliability, and Blind-Spot Ceilings Reasoning Language Models Backdoor Unlearning Generalization: A Path Toward the Removal of Unknown Triggers in LLMs

Recent events (1)

5arXiv · cs.CL·Jun 16, 2026·source ↗

Semi-supervised framework scales LLM reasoning with minimal labeled data via lightweight verifier

A new arXiv preprint proposes a semi-supervised framework for training LLMs to reason with very few labeled examples, using a lightweight classifier to judge the validity of intermediate reasoning traces. An entropy-based confidence threshold filters unreliable pseudo-labels before fine-tuning. Experiments on math reasoning (Orca-Math subset) and visual QA (GQA) show accuracy comparable to using 10-15x more labeled data. The approach reduces dependence on expensive answer-level supervision by turning verification into a data-creation mechanism.

Evaluation and Benchmarking Alignment and RLHF GQA Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier Orca-Math