other
Thinking-Acting Gap
otheractiveprovisional
thinking-acting-gap-dedd7879·1 events·first seen 20d agoAliases: Thinking-Acting Gap
Co-occurring entities
More like this (12)
Abstraction GapVision-Language-Action modelsadaptive thinkingextended thinkingBeyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning ModelsResearch Gap Inferenceregional-to-global perception gapcapability-reliability gapsim-to-real gapVision-Language-Action modelThinking Machines Interaction ModelWhen Built-in Thinking Helps and Hurts: Constraint-Level Error Shifts in Instruction Following
Recent events (1)
AXPO: Agent Explorative Policy Optimization Addresses Thinking-Acting Gap in Multimodal Agentic Reasoning
This paper identifies a structural asymmetry in agentic reasoning called the 'Thinking-Acting Gap,' where tool use is attempted in only ~30% of rollouts under standard RL training (GRPO), and all-wrong tool-using subgroups suppress learning signals. The authors propose AXPO (Agent eXplorative Policy Optimization), which fixes the thinking prefix and resamples tool calls for all-wrong subgroups, combined with uncertainty-based prefix selection. Evaluated across nine multimodal benchmarks on Qwen3-VL-Thinking at multiple scales, SFT+AXPO outperforms SFT+GRPO by +1.8pp on both Pass@1 and Pass@4 at 8B, with the 8B model surpassing the 32B baseline on Pass@4 using 4× fewer parameters.