product
Tool-RL-Box
productactiveprovisional
tool-rl-box-7a60e53d·1 events·first seen 5d agoAliases: Tool-RL-Box
Co-occurring entities
More like this (12)
Recent events (1)
Paper diagnoses RL collapse in multi-step tool-use training and proposes supervisory signal fixes
A new arXiv preprint identifies a failure mode in reinforcement learning for LLM tool use: catastrophic collapse caused by probability spikes in control tokens that disrupt structured execution while leaving underlying tool-use capability intact. The authors systematically evaluate supervisory signals—including off-policy supervision, hint-based guidance, and erroneous example supervision—under synchronous and interleaved training schemes. Interleaving SFT with RL improves stability but degrades performance under out-of-distribution format and content evaluation. Code is released as Tool-RL-Box.