Entity · technique

IH-GRPO

techniqueactiveih-grpo-decd3666·1 events·first seen May 19, 2026

Aliases: IH-GRPO

Co-occurring entities

GRPO Tool-Integrated Reasoning Qwen3-4B Qwen3-1.7B

More like this (12)

GRPO N-GRPO AdvGRPO Flow-GRPO GRPO (Group Relative Policy Optimization)Hcompany Off-Context GRPO IH-Challenge Dr. GRPO Latent-Anchored GRPO GIC AdaPrefix-GRPO

Recent events (1)

6arXiv · cs.CL·May 19, 2026·source ↗

Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning

This paper introduces IH-GRPO, a reinforcement learning algorithm that decouples tool invocation from immediate execution during LLM reasoning, addressing the coherence disruption caused by tight coupling in existing tool-integrated reasoning (TIR) approaches. The authors propose a hierarchical control framework and derive a surrogate loss enabling an implicitly hierarchical policy to match the behavior of an explicit hierarchical policy. Experiments on Qwen3 models (1.7B, 4B, 8B) show absolute improvements of 1.87–2.53% across six out-of-domain mathematical reasoning benchmarks over the strongest baseline. Code is publicly released.

Evaluation and Benchmarking Agent and Tool Ecosystem GRPO Tool-Integrated Reasoning Qwen3-4B +3 more