Eprint already available on another site (E-prints, working papers and research blog)
Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures
Bolland, Adrien; Lambrechts, Gaspard; Ernst, Damien
2024
 

Files


Full Text
Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures.pdf
Author preprint (2.52 MB) Creative Commons License - Attribution, Non-Commercial, No Derivative
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Reinforcement Learning; Exploration
Abstract :
[en] We introduce a new maximum entropy reinforcement learning framework based on the distribution of states and actions visited by a policy. More precisely, an intrinsic reward function is added to the reward function of the Markov decision process that shall be controlled. For each state and action, this intrinsic reward is the relative entropy of the discounted distribution of states and actions (or features from these states and actions) visited during the next time steps. We first prove that an optimal exploration policy, which maximizes the expected discounted sum of intrinsic rewards, is also a policy that maximizes a lower bound on the state-action value function of the decision process under some assumptions. We also prove that the visitation distribution used in the intrinsic reward definition is the fixed point of a contraction operator. Following, we describe how to adapt existing algorithms to learn this fixed point and compute the intrinsic rewards to enhance exploration. A new practical off-policy maximum entropy reinforcement learning algorithm is finally introduced. Empirically, exploration policies have good state-action space coverage, and high-performing control policies are computed efficiently.
Disciplines :
Computer science
Author, co-author :
Bolland, Adrien ;  Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Lambrechts, Gaspard ;  Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Ernst, Damien  ;  Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Language :
English
Title :
Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures
Publication date :
2024
Source :
Available on ORBi :
since 11 December 2024

Statistics


Number of views
13 (5 by ULiège)
Number of downloads
4 (1 by ULiège)

Bibliography


Similar publications



Contact ORBi