Scientific conference in universities or research centers (Scientific conferences in universities or research centers)
Maximum Entropy RL and Policy Gradients: Why and What to Explore?
Bolland, Adrien
2026
 

Files


Full Text
Maximum Entropy RL and Policy Gradients Why and What to Explore.pdf
Author postprint (7.5 MB) Creative Commons License - Attribution, Non-Commercial, No Derivative
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Reinforcement Learning; Exploration
Abstract :
[en] This presentation studies policy gradient algorithms that regularize entropy in their learning objective to encourage exploration, i.e., algorithms that optimize maximum-entropy reinforcement learning objectives. In the first part, the influence of entropy regularization on policy gradient performance is discussed in the light of numerical optimization theory. The focus lies on how it reshapes the learning objective and how it helps satisfy key assumptions used in convergence proofs. In the second part, a maximum-entropy objective that enforces exploration of future states and actions is discussed. A concrete off-policy policy-gradient algorithm is developed, discussed, and illustrated empirically.
Disciplines :
Computer science
Author, co-author :
Bolland, Adrien ;  Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Language :
English
Title :
Maximum Entropy RL and Policy Gradients: Why and What to Explore?
Publication date :
06 February 2026
Event name :
Mini-Workshop on Reinforcement Learning
Event organizer :
Leif Döring
Théo Vincent
Simon Weißmann
Claire Vernade
Event place :
Mannheim, Germany
Event date :
February 6th, 2026
Audience :
International
Available on ORBi :
since 06 February 2026

Statistics


Number of views
23 (3 by ULiège)
Number of downloads
13 (1 by ULiège)

Bibliography


Similar publications



Contact ORBi