Entity · product

PROVE

productactiveprove-4c32b187·1 events·first seen Jun 3, 2026

Aliases: PROVE

Co-occurring entities

Qwen2.5-7B GRPO Qwen3-4B BFCL Multi-Turn T-Eval TAU-bench Granite 4.1 Model Context Protocol

More like this (12)

ProverBench agentic proving Seed-Prover 1.5 RePro BOPTEST Reverse Probing ProAct First Proof Formal Proof Search ProActEval VERITAS Neural Theorem Prover

Recent events (1)

7arXiv · cs.CL·Jun 3, 2026·source ↗

PROVE framework trains LLMs for multi-step tool use via stateful MCP environments and programmatic rewards

Researchers introduce PROVE (Programmatic Rewards On Verified Environments), a framework for training LLMs to orchestrate multi-step tool calls using reinforcement learning. The system includes a library of 20 stateful MCP servers with 343 tools, an automated data synthesis pipeline that grounds training queries in live server state, and a multi-component programmatic reward function requiring no judge model. Training four models (Qwen3-4B, Qwen3-8B, Qwen2.5-7B, Granite-4.1-8B) with ~13K examples yields gains of up to +10.2 on BFCL Multi-Turn, +6.8 on tau2-bench, and +6.5 on T-Eval, demonstrating consistent improvements in multi-step tool orchestration.

Evaluation and Benchmarking Agent and Tool Ecosystem Qwen2.5-7B GRPO Qwen3-4B +7 more