Eprint already available on another site (E-prints, working papers and research blog)
Behind the Myth of Exploration in Policy Gradients
Bolland, Adrien; Lambrechts, Gaspard; Ernst, Damien
2024
 

Files


Full Text
Behind the Myth of Exploration in Policy Gradients.pdf
Author preprint (1.15 MB) Creative Commons License - Attribution, Non-Commercial, ShareAlike
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
reinforcement learning; exploration; policy gradients
Abstract :
[en] Policy-gradient algorithms are effective reinforcement learning methods for solving control problems with continuous state and action spaces. To compute near-optimal policies, it is essential in practice to include exploration terms in the learning objective. Although the effectiveness of these terms is usually justified by an intrinsic need to explore environments, we propose a novel analysis and distinguish two different implications of these techniques. First, they make it possible to smooth the learning objective and to eliminate local optima while preserving the global maximum. Second, they modify the gradient estimates, increasing the probability that the stochastic parameter update eventually provides an optimal policy. In light of these effects, we discuss and illustrate empirically exploration strategies based on entropy bonuses, highlighting their limitations and opening avenues for future works in the design and analysis of such strategies.
Disciplines :
Computer science
Author, co-author :
Bolland, Adrien ;  Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Lambrechts, Gaspard ;  Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Ernst, Damien  ;  Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Language :
English
Title :
Behind the Myth of Exploration in Policy Gradients
Publication date :
January 2024
Funders :
F.R.S.-FNRS - Fund for Scientific Research [BE]
Available on ORBi :
since 02 February 2024

Statistics


Number of views
51 (17 by ULiège)
Number of downloads
16 (3 by ULiège)

Bibliography


Similar publications



Contact ORBi