Almanac
technique

VEPO

techniqueactiveprovisionalvepo-ec527e9e·1 events·first seen 13d ago

Aliases: VEPO

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·13d ago·source ↗

VEPO: Vision-anchored token selection improves RL for visual reasoning

A new arXiv paper identifies a failure mode of entropy-based credit assignment in multimodal reinforcement learning: vision-sensitive tokens with naturally low entropy are systematically ignored, causing the mechanism to collapse in visual reasoning tasks. The authors propose VEPO (Vision-Entropy token-selection for Policy Optimization), which couples visual sensitivity with token entropy via a multiplicative scheme to redirect gradient credit toward tokens that are both visually grounded and semantically informative. VEPO outperforms entropy-only baselines by 2.28 points at 7B scale and 3.15 points at 3B scale on visual reasoning benchmarks.