technique
VEPO
techniqueactiveprovisional
vepo-ec527e9e·1 events·first seen 13d agoAliases: VEPO
Co-occurring entities
More like this (12)
Recent events (1)
VEPO: Vision-anchored token selection improves RL for visual reasoning
A new arXiv paper identifies a failure mode of entropy-based credit assignment in multimodal reinforcement learning: vision-sensitive tokens with naturally low entropy are systematically ignored, causing the mechanism to collapse in visual reasoning tasks. The authors propose VEPO (Vision-Entropy token-selection for Policy Optimization), which couples visual sensitivity with token entropy via a multiplicative scheme to redirect gradient credit toward tokens that are both visually grounded and semantically informative. VEPO outperforms entropy-only baselines by 2.28 points at 7B scale and 3.15 points at 3B scale on visual reasoning benchmarks.