Entity · technique

Entropy-Regularized Reinforcement Learning

techniqueactiveentropy-regularized-reinforcement-learning-31aba912·1 events·first seen May 26, 2026

Aliases: Entropy-Regularized Reinforcement Learning

Co-occurring entities

Optimal Transport Langevin Dynamics Soft Q-Function Polyak–Łojasiewicz Condition Wasserstein Policy Gradient Log-Sobolev Inequality

More like this (12)

KL-regularized RL Entropy Regularization Physics-EnhAnced Reinforcement Learning Divergence Regularized Policy Optimization rule-based reinforcement learning rewards Constrained Reinforcement Learning Improving LLM-Generated Process Model Quality Through Reinforcement Learning: The Role of Reward Function Design Gradient-Guided Reward Optimization shielded reinforcement learning Target Distribution Regularization ExpRL: Exploratory RL for LLM Mid-Training SERPO: Self-Evolving Rubric Policy Optimization for Open-Ended Test-Time Reinforcement Learning

Recent events (1)

5arXiv · cs.LG·May 26, 2026·source ↗

Global Convergence Theory for Wasserstein Policy Gradient in Entropy-Regularized RL

This paper establishes the first global convergence theory for Wasserstein Policy Gradient (WPG), a continuous-control RL optimization method that uses optimal-transport geometry over action distributions. The authors show that the Bellman recursion structure of entropy-regularized RL induces a Polyak–Łojasiewicz (PL) geometry that substitutes for classical convexity, enabling global convergence analysis. Key technical contributions include a statewise KL representation of the soft Bellman residual, a Bellman resolvent identity linking value improvement to relative Fisher information, and a uniform log-Sobolev inequality for the evolving Gibbs policy family. The result yields geometric contraction up to discretization bias, providing theoretical grounding for WPG in continuous-action settings.

AI Safety Research Optimal Transport Langevin Dynamics Soft Q-Function +4 more