Entity · technique

Soft Q-Function

techniqueactivesoft-q-function-e75296f3·1 events·first seen May 26, 2026

Aliases: Soft Q-Function

Co-occurring entities

Optimal Transport Langevin Dynamics Polyak–Łojasiewicz Condition Wasserstein Policy Gradient Log-Sobolev Inequality Entropy-Regularized Reinforcement Learning

More like this (12)

Soft Q-Learning SoftReason WebQSP MorletQK SFQ-Agent factorized FSQ Q-Former QK-Restore RULER QA-2 SpeechEQ QwQ-32B Double Q-learning

Recent events (1)

5arXiv · cs.LG·May 26, 2026·source ↗

Global Convergence Theory for Wasserstein Policy Gradient in Entropy-Regularized RL

This paper establishes the first global convergence theory for Wasserstein Policy Gradient (WPG), a continuous-control RL optimization method that uses optimal-transport geometry over action distributions. The authors show that the Bellman recursion structure of entropy-regularized RL induces a Polyak–Łojasiewicz (PL) geometry that substitutes for classical convexity, enabling global convergence analysis. Key technical contributions include a statewise KL representation of the soft Bellman residual, a Bellman resolvent identity linking value improvement to relative Fisher information, and a uniform log-Sobolev inequality for the evolving Gibbs policy family. The result yields geometric contraction up to discretization bias, providing theoretical grounding for WPG in continuous-action settings.

AI Safety Research Optimal Transport Langevin Dynamics Soft Q-Function +4 more