Behind the Myth of Exploration in Policy Gradients

Bolland, Adrien; Lambrechts, Gaspard; Ernst, Damien

Download

Paper published on a website (Scientific congresses and symposiums)

Behind the Myth of Exploration in Policy Gradients

Bolland, Adrien; Lambrechts, Gaspard; Ernst, Damien

2025 • European Workshop on Reinforcement Learning

Peer reviewed

Permalink
https://hdl.handle.net/2268/312658

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

Behind the Myth of Exploration in Policy Gradients.pdf

Publisher postprint (1.02 MB)

Creative Commons License - Attribution, ShareAlike

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

reinforcement learning; exploration; policy gradients

Abstract :

[en] In order to compute near-optimal policies with policy-gradient algorithms, it is common in practice to include intrinsic exploration terms in the learning objective. Although the effectiveness of these terms is usually justified by an intrinsic need to explore environments, we propose a novel analysis with the lens of numerical optimization. Two criteria are introduced on the learning objective and two others on its stochastic gradient estimates, and are afterwards used to discuss the quality of the policy after optimization. The analysis sheds the light on two separate effects of exploration techniques. First, they make it possible to smooth the learning objective and to eliminate local optima while preserving the global maximum. Second, they modify the gradient estimates, increasing the probability that the stochastic parameter updates eventually provide an optimal policy. We empirically illustrate these effects with exploration strategies based on entropy bonuses, identifying limitations and suggesting directions for future work.

Disciplines :

Computer science

Author, co-author :

Bolland, Adrien ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids

Lambrechts, Gaspard ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids

Ernst, Damien ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids

Language :

English

Title :

Behind the Myth of Exploration in Policy Gradients

Publication date :

17 July 2025

Event name :

European Workshop on Reinforcement Learning

Event place :

Tübingen, Germany

Event date :

September 17th, 2025

Audience :

International

Peer review/Selection committee :

Peer reviewed

Source :

EWRL 2025

Funders :

F.R.S.-FNRS - Fonds de la Recherche Scientifique

Available on ORBi :

since 02 February 2024

Statistics

Number of views

214 (50 by ULiège)

Number of downloads

59 (7 by ULiège)

More statistics