Almanac
product

HTV-Agent

productactiveprovisionalhtv-agent-0f191ee8·1 events·first seen 9h ago

Aliases: HTV-Agent

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·9h ago·source ↗

VeriEvol: Verified data construction pipeline for scaling multimodal mathematical reasoning

VeriEvol is a new framework for scaling reinforcement learning on visual mathematical reasoning by decoupling prompt difficulty expansion from answer reliability verification. It uses a type-aware evolution module to generate harder image-grounded prompts and an HTV-Agent verifier that rejects answers only after failing to find counter-evidence. Scaling SFT data from 10K to 250K samples raises mean accuracy from 35.42 to 54.73 across five visual-math benchmarks, with an additional +3.88 cumulative gain over an un-evolved RL baseline when combined with GRPO-style training. The authors release prompts, data, models, code, and full verifier traces.