Entity · paper

Using Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning

paperactiveusing-reward-uncertainty-to-induce-diverse-behaviour-in-reinforcement-learning-a10412d1·1 events·first seen Jun 3, 2026

Aliases: Using Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning

More like this (12)

Improving LLM-Generated Process Model Quality Through Reinforcement Learning: The Role of Reward Function Design Gradient-Guided Reward Optimization rule-based reinforcement learning rewards curiosity-driven reinforcement learning Physics-EnhAnced Reinforcement Learning UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs reinforcement learning with belief-state rewards Reinforcement Learning from Rich Feedback with Distributional DAgger Entropy-Regularized Reinforcement Learning decoupled reinforcement learning Entropy Is Not Enough: Unlocking Effective Reinforcement Learning for Visual Reasoning via Vision-Anchored Token Selection

Recent events (1)

5arXiv · cs.LG·Jun 3, 2026·source ↗

Reward uncertainty as a principled mechanism for diverse RL behaviour

A new arXiv preprint proposes replacing the scalar reward in RL with a distribution over reward functions, applying a non-linear objective over sets of actions to induce calibrated behavioural diversity without sacrificing expected reward. The authors derive a principled gradient estimator in the contextual bandit setting and prove the formulation generalizes vanilla policy gradient and action-set approaches. The work is motivated by applications like language model fine-tuning where diversity is desirable but entropy regularization and diversity bonuses introduce fragile trade-offs. Empirical results support the framework as a theoretically grounded alternative to heuristic diversity methods.

Evaluation and Benchmarking Alignment and RLHF Using Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning