benchmark
AppWorld
benchmarkactiveprovisional
appworld-d2dae6ec·1 events·first seen 7d agoAliases: AppWorld
Co-occurring entities
More like this (12)
Recent events (1)
KATE framework improves LLM tool calling via experiential knowledge integration and parallel reasoning
Researchers present KATE (Knowledge-Augmented Tool Execution), a framework addressing LLM failures in multi-step tool use by systematically studying knowledge acquisition, activation, and internalization. Key findings include that instance-level experiential knowledge outperforms abstract intent-level knowledge, that expanding reasoning width via parallel sampling with aggregation beats deeper chain-of-thought, and that reinforcement learning outperforms supervised fine-tuning for knowledge internalization. KATE is evaluated on BFCL-V3 and AppWorld benchmarks, showing consistent improvements over strong baselines across model scales.