Entity · technique

VEPO

techniqueactivevepo-ec527e9e·1 events·first seen Jun 3, 2026

Aliases: VEPO

Co-occurring entities

Entropy Is Not Enough: Unlocking Effective Reinforcement Learning for Visual Reasoning via Vision-Anchored Token Selection

More like this (12)

SERPO VASAE PEVA VLESA EEVEE FastV O-VAD EG-VQA CVXPY Veo DPPO CVDP

Recent events (1)

5arXiv · cs.AI·Jun 3, 2026·source ↗

VEPO: Vision-anchored token selection improves RL for visual reasoning

A new arXiv paper identifies a failure mode of entropy-based credit assignment in multimodal reinforcement learning: vision-sensitive tokens with naturally low entropy are systematically ignored, causing the mechanism to collapse in visual reasoning tasks. The authors propose VEPO (Vision-Entropy token-selection for Policy Optimization), which couples visual sensitivity with token entropy via a multiplicative scheme to redirect gradient credit toward tokens that are both visually grounded and semantically informative. VEPO outperforms entropy-only baselines by 2.28 points at 7B scale and 3.15 points at 3B scale on visual reasoning benchmarks.

Alignment and RLHF Multimodal Progress VEPO Entropy Is Not Enough: Unlocking Effective Reinforcement Learning for Visual Reasoning via Vision-Anchored Token Selection