technique
IH-GRPO
techniqueactive
ih-grpo-decd3666·1 events·first seen 29d agoAliases: IH-GRPO
Co-occurring entities
More like this (12)
Recent events (1)
Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning
This paper introduces IH-GRPO, a reinforcement learning algorithm that decouples tool invocation from immediate execution during LLM reasoning, addressing the coherence disruption caused by tight coupling in existing tool-integrated reasoning (TIR) approaches. The authors propose a hierarchical control framework and derive a surrogate loss enabling an implicitly hierarchical policy to match the behavior of an explicit hierarchical policy. Experiments on Qwen3 models (1.7B, 4B, 8B) show absolute improvements of 1.87–2.53% across six out-of-domain mathematical reasoning benchmarks over the strongest baseline. Code is publicly released.