[en] In this work, we generalize the problem of learning through interaction in a POMDP by accounting for eventual additional information available at training time. First, we introduce the informed POMDP, a new learning paradigm offering a clear distinction between the information at training and the observation at execution. Next, we propose an objective that leverages this information for learning a sufficient statistic of the history for the optimal control. We then adapt this informed objective to learn a world model able to sample latent trajectories. Finally, we empirically show a learning speed improvement in several environments using this informed world model in the Dreamer algorithm. These results and the simplicity of the proposed adaptation advocate for a systematic consideration of eventual additional information when learning in a POMDP using model-based RL.
Disciplines :
Computer science
Author, co-author :
Lambrechts, Gaspard ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Bolland, Adrien ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Ernst, Damien ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Language :
English
Title :
Informed POMDP: Leveraging Additional Information in Model-Based RL
Publication date :
August 2024
Event name :
Reinforcement Learning Conference
Event place :
Amherst, United States - Massachusetts
Event date :
August 9th, 2024
Audience :
International
Journal title :
Reinforcement Learning Journal
ISSN :
2996-8569
eISSN :
2996-8577
Peer review/Selection committee :
Peer reviewed
Tags :
CÉCI : Consortium des Équipements de Calcul Intensif Tier-1 supercomputer