product

VeriEvol

productactiveprovisionalverievol-fe235755·1 events·first seen 10h ago

Aliases: VeriEvol

Co-occurring entities

GRPO HTV-Agent

More like this (12)

VeriTrace EEVEE EvoStruct EvoArena EvoMem EvoSkill AlphaEvolve EvolveNav MLEvolve Veo P-GUI-Evo KVEraser

Recent events (1)

5arXiv · cs.CL·10h ago·source ↗

VeriEvol: Verified data construction pipeline for scaling multimodal mathematical reasoning

VeriEvol is a new framework for scaling reinforcement learning on visual mathematical reasoning by decoupling prompt difficulty expansion from answer reliability verification. It uses a type-aware evolution module to generate harder image-grounded prompts and an HTV-Agent verifier that rejects answers only after failing to find counter-evidence. Scaling SFT data from 10K to 250K samples raises mean accuracy from 35.42 to 54.73 across five visual-math benchmarks, with an additional +3.88 cumulative gain over an un-evolved RL baseline when combined with GRPO-style training. The authors release prompts, data, models, code, and full verifier traces.

Evaluation and Benchmarking Alignment and RLHF GRPO VeriEvol HTV-Agent +1 more