technique
decoupled reinforcement learning
techniqueactiveprovisional
decoupled-reinforcement-learning-f05d9f49·1 events·first seen 20d agoAliases: decoupled reinforcement learning
Co-occurring entities
More like this (12)
decision-content decoupled reinforcement learningshielded reinforcement learningsim-to-real reinforcement learningcuriosity-driven reinforcement learningrule-based reinforcement learning rewardsConstrained Reinforcement LearningReinforcement Learning for CodeEntropy-Regularized Reinforcement LearningUniIntervene: Agentic Intervention for Efficient Real-World Reinforcement LearningUsing Reward Uncertainty to Induce Diverse Behaviour in Reinforcement LearningDisentangled Representation Learninginference-time behavioural unlearning
Recent events (1)
OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration
OmniVerifier-M1 is a generalist visual verifier trained using symbolic meta-verification rationales (e.g., bounding boxes) and decoupled reinforcement learning objectives for binary judgment versus meta-verification. The paper finds that symbolic verifier outputs outperform textual explanations as rationales, enabling rule-based RL rewards without auxiliary judge models, and that decoupling RL objectives substantially improves performance over joint optimization. The system further enables M1-TTS, a verifier-driven agentic generation pipeline supporting dynamic region-level self-correction in multimodal outputs.