Paper published on a website (Scientific congresses and symposiums)
Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures
Bolland, Adrien; Lambrechts, Gaspard; Ernst, Damien
2025European Workshop on Reinforcement Learning
Peer reviewed
 

Files


Full Text
Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures.pdf
Author postprint (2.43 MB) Creative Commons License - Attribution, ShareAlike
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Reinforcement Learning; Exploration
Abstract :
[en] Maximum entropy reinforcement learning integrates exploration into policy learning by providing additional intrinsic rewards proportional to the entropy of some distribution. In this paper, we propose a novel approach in which the intrinsic reward function is the relative entropy of the discounted distribution of states and actions (or features derived from these states and actions) visited during future time steps. This approach is motivated by two results. First, a policy maximizing the expected discounted sum of intrinsic rewards also maximizes a lower bound on the state-action value function of the decision process. Second, the distribution used in the intrinsic reward definition is the fixed point of a contraction operator. Existing algorithms can therefore be adapted to learn this fixed point off-policy and to compute the intrinsic rewards. We finally introduce an algorithm maximizing our new objective, and we show that resulting policies have good state-action space coverage and achieve high-performance control.
Disciplines :
Computer science
Author, co-author :
Bolland, Adrien ;  Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Lambrechts, Gaspard ;  Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Ernst, Damien  ;  Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Language :
English
Title :
Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures
Publication date :
17 July 2025
Event name :
European Workshop on Reinforcement Learning
Event place :
Tübingen, Germany
Event date :
September 17th, 2025
Audience :
International
Peer review/Selection committee :
Peer reviewed
Source :
Funders :
F.R.S.-FNRS - Fund for Scientific Research
Available on ORBi :
since 11 December 2024

Statistics


Number of views
218 (34 by ULiège)
Number of downloads
86 (3 by ULiège)

Bibliography


Similar publications



Contact ORBi