[en] This presentation studies policy gradient algorithms that regularize entropy in their learning objective to encourage exploration, i.e., algorithms that optimize maximum-entropy reinforcement learning objectives. In the first part, the influence of entropy regularization on policy gradient performance is discussed in the light of numerical optimization theory. The focus lies on how it reshapes the learning objective and how it helps satisfy key assumptions used in convergence proofs. In the second part, a maximum-entropy objective that enforces exploration of future states and actions is discussed. A concrete off-policy policy-gradient algorithm is developed, discussed, and illustrated empirically.
Disciplines :
Computer science
Author, co-author :
Bolland, Adrien ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Language :
English
Title :
Maximum Entropy RL and Policy Gradients: Why and What to Explore?
Publication date :
06 February 2026
Event name :
Mini-Workshop on Reinforcement Learning
Event organizer :
Leif Döring Théo Vincent Simon Weißmann Claire Vernade