Almanac
benchmark

τ²-Bench

benchmarkactive-bench-49ae3732·1 events·first seen 29d ago

Aliases: τ²-Bench

Co-occurring entities

More like this (12)

Recent events (1)

7arXiv · cs.CL·29d ago·source ↗

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

EnvFactory is a fully automated framework for training tool-use LLM agents via Agentic Reinforcement Learning, addressing two key bottlenecks: scalable execution environments and realistic multi-turn training data. It autonomously constructs stateful, executable tool environments from authentic resources and synthesizes natural trajectories with implicit human intents via topology-aware sampling. Using only 85 verified environments across 7 domains, it generates 2,575 SFT and RL trajectories and improves Qwen3-series models by up to +15% on BFCLv3, +8.6% on MCP-Atlas, and +6% on conversational benchmarks, outperforming prior approaches that use 5x more environments.