Entity · technique

decoupled reinforcement learning

techniqueactivedecoupled-reinforcement-learning-f05d9f49·1 events·first seen May 28, 2026

Aliases: decoupled reinforcement learning

Co-occurring entities

Multimodal Large Language Models multimodal meta-verification M1-TTS symbolic meta-verification bounding box symbolic rationales rule-based reinforcement learning rewards OmniVerifier-M1 symbolic verifier outputs

More like this (12)

decision-content decoupled reinforcement learning shielded reinforcement learning Active Offline-to-Online Reinforcement Learning Physics-EnhAnced Reinforcement Learning sim-to-real reinforcement learning curiosity-driven reinforcement learning rule-based reinforcement learning rewards Constrained Reinforcement Learning Reinforcement Learning for Code Entropy-Regularized Reinforcement Learning UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning Using Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning

Recent events (1)

6arXiv · cs.AI·May 28, 2026·source ↗

OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration

OmniVerifier-M1 is a generalist visual verifier trained using symbolic meta-verification rationales (e.g., bounding boxes) and decoupled reinforcement learning objectives for binary judgment versus meta-verification. The paper finds that symbolic verifier outputs outperform textual explanations as rationales, enabling rule-based RL rewards without auxiliary judge models, and that decoupling RL objectives substantially improves performance over joint optimization. The system further enables M1-TTS, a verifier-driven agentic generation pipeline supporting dynamic region-level self-correction in multimodal outputs.

Evaluation and Benchmarking Agent and Tool Ecosystem Multimodal Large Language Models multimodal meta-verification decoupled reinforcement learning +8 more