Paper published on a website (Scientific congresses and symposiums)
Behind the Myth of Exploration in Policy Gradients
Bolland, Adrien; Lambrechts, Gaspard; Ernst, Damien
2025European Workshop on Reinforcement Learning
Peer reviewed
 

Files


Full Text
Behind the Myth of Exploration in Policy Gradients.pdf
Publisher postprint (1.02 MB) Creative Commons License - Attribution, ShareAlike
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
reinforcement learning; exploration; policy gradients
Abstract :
[en] In order to compute near-optimal policies with policy-gradient algorithms, it is common in practice to include intrinsic exploration terms in the learning objective. Although the effectiveness of these terms is usually justified by an intrinsic need to explore environments, we propose a novel analysis with the lens of numerical optimization. Two criteria are introduced on the learning objective and two others on its stochastic gradient estimates, and are afterwards used to discuss the quality of the policy after optimization. The analysis sheds the light on two separate effects of exploration techniques. First, they make it possible to smooth the learning objective and to eliminate local optima while preserving the global maximum. Second, they modify the gradient estimates, increasing the probability that the stochastic parameter updates eventually provide an optimal policy. We empirically illustrate these effects with exploration strategies based on entropy bonuses, identifying limitations and suggesting directions for future work.
Disciplines :
Computer science
Author, co-author :
Bolland, Adrien ;  Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Lambrechts, Gaspard ;  Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Ernst, Damien  ;  Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Language :
English
Title :
Behind the Myth of Exploration in Policy Gradients
Publication date :
17 July 2025
Event name :
European Workshop on Reinforcement Learning
Event place :
Tübingen, Germany
Event date :
September 17th, 2025
Audience :
International
Peer review/Selection committee :
Peer reviewed
Source :
Funders :
F.R.S.-FNRS - Fonds de la Recherche Scientifique
Available on ORBi :
since 02 February 2024

Statistics


Number of views
152 (48 by ULiège)
Number of downloads
45 (7 by ULiège)

Bibliography


Similar publications



Contact ORBi