Almanac
technique

Wasserstein Policy Gradient

techniqueactiveprovisionalwasserstein-policy-gradient-7a392041·1 events·first seen 22d ago

Aliases: Wasserstein Policy Gradient

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.LG·22d ago·source ↗

Global Convergence Theory for Wasserstein Policy Gradient in Entropy-Regularized RL

This paper establishes the first global convergence theory for Wasserstein Policy Gradient (WPG), a continuous-control RL optimization method that uses optimal-transport geometry over action distributions. The authors show that the Bellman recursion structure of entropy-regularized RL induces a Polyak–Łojasiewicz (PL) geometry that substitutes for classical convexity, enabling global convergence analysis. Key technical contributions include a statewise KL representation of the soft Bellman residual, a Bellman resolvent identity linking value improvement to relative Fisher information, and a uniform log-Sobolev inequality for the evolving Gibbs policy family. The result yields geometric contraction up to discretization bias, providing theoretical grounding for WPG in continuous-action settings.