product

HTV-Agent

productactiveprovisionalhtv-agent-0f191ee8·1 events·first seen 9h ago

Aliases: HTV-Agent

Co-occurring entities

GRPO VeriEvol

More like this (12)

STT-Agent-4B SWE-Agent RD-Agent EurekAgent HackAgent Agent-S HPDv2 HPDv3 Agents-K1 Baseline Agent TPC-H FedTSV

Recent events (1)

5arXiv · cs.CL·9h ago·source ↗

VeriEvol: Verified data construction pipeline for scaling multimodal mathematical reasoning

VeriEvol is a new framework for scaling reinforcement learning on visual mathematical reasoning by decoupling prompt difficulty expansion from answer reliability verification. It uses a type-aware evolution module to generate harder image-grounded prompts and an HTV-Agent verifier that rejects answers only after failing to find counter-evidence. Scaling SFT data from 10K to 250K samples raises mean accuracy from 35.42 to 54.73 across five visual-math benchmarks, with an additional +3.88 cumulative gain over an un-evolved RL baseline when combined with GRPO-style training. The authors release prompts, data, models, code, and full verifier traces.

Evaluation and Benchmarking Alignment and RLHF GRPO VeriEvol HTV-Agent +1 more