Almanac
dataset

Orca-Math

datasetactiveprovisionalorca-math-a882a766·1 events·first seen 26h ago

Aliases: Orca-Math

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·26h ago·source ↗

Semi-supervised framework scales LLM reasoning with minimal labeled data via lightweight verifier

A new arXiv preprint proposes a semi-supervised framework for training LLMs to reason with very few labeled examples, using a lightweight classifier to judge the validity of intermediate reasoning traces. An entropy-based confidence threshold filters unreliable pseudo-labels before fine-tuning. Experiments on math reasoning (Orca-Math subset) and visual QA (GQA) show accuracy comparable to using 10-15x more labeled data. The approach reduces dependence on expensive answer-level supervision by turning verification into a data-creation mechanism.