Entropy-Regularized Reinforcement Learning
entropy-regularized-reinforcement-learning-31aba912·1 events·first seen 22d agoAliases: Entropy-Regularized Reinforcement Learning
Co-occurring entities
More like this (12)
Recent events (1)
Global Convergence Theory for Wasserstein Policy Gradient in Entropy-Regularized RL
This paper establishes the first global convergence theory for Wasserstein Policy Gradient (WPG), a continuous-control RL optimization method that uses optimal-transport geometry over action distributions. The authors show that the Bellman recursion structure of entropy-regularized RL induces a Polyak–Łojasiewicz (PL) geometry that substitutes for classical convexity, enabling global convergence analysis. Key technical contributions include a statewise KL representation of the soft Bellman residual, a Bellman resolvent identity linking value improvement to relative Fisher information, and a uniform log-Sobolev inequality for the evolving Gibbs policy family. The result yields geometric contraction up to discretization bias, providing theoretical grounding for WPG in continuous-action settings.