Almanac
person

Hamish Ivison

personactivehamish-ivison-714ae0f8·1 events·first seen 7d ago

Aliases: Hamish Ivison

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·7d ago·source ↗

Tmax: Open RL training recipe for terminal-using agents achieves 27% on Terminal-Bench 2.0 with 9B parameters

Researchers present Tmax, an open RL training recipe for terminal-using language model agents, achieving 27% on Terminal-Bench 2.0 with a 9B parameter model while outperforming larger models from prior work. The recipe combines a novel data generation taxonomy using difficulty control, personas, and verifier diversification to produce a terminal environment dataset over 2.5x larger than previously released datasets. Training uses a simple outcome-only RL approach, and the authors release data, models, and code to lower the barrier for academic research on terminal agents.