Belief states of POMDPs and internal states of recurrent RL agents: an empirical analysis of their mutual information

[en] Reinforcement learning aims to learn optimal policies from interaction with environments whose dynamics are unknown. Many methods rely on the approximation of a value function to derive near-optimal policies. In partially observable environments, these functions depend on the complete sequence of observations and past actions, called the history. In this work, we show empirically that recurrent neural networks trained to approximate such value functions internally filter the posterior probability distribution of the current state given the history, called the belief. More precisely, we show that, as a recurrent neural network learns the Q-function, its hidden states become more and more correlated with the beliefs of state variables that are relevant to optimal control. This correlation is measured through their mutual information. In addition, we show that the expected return of an agent increases with the ability of its recurrent architecture to reach a high mutual information between its hidden states and the beliefs. Finally, we show that the mutual information between the hidden states and the beliefs of variables that are irrelevant for optimal control decreases through the learning process. In summary, this work shows that in its hidden states, a recurrent neural network approximating the Q-function of a partially observable environment reproduces a sufficient statistic from the history that is correlated to the relevant part of the belief for taking optimal actions.

Disciplines :

Computer science

Author, co-author :

Lambrechts, Gaspard ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids

Bolland, Adrien ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids

Ernst, Damien ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids

Language :

English

Title :

Belief states of POMDPs and internal states of recurrent RL agents: an empirical analysis of their mutual information

Publication date :

September 2022

Event name :

European Workshop on Reinforcement Learning

Event organizer :

Politecnico di Milano

Event place :

Milan, Italy

Event date :

from 19 to 21 September 2022

Event number :

Audience :

International

Peer reviewed :

Peer reviewed

Source :

EWRL 2022 - Belief states of POMDPs and internal states of recurrent RL agents: an empirical analysis of their mutual information

Tags :

CÉCI : Consortium des Équipements de Calcul Intensif

Funders :

FRIA - Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture

Commentary :

An extended version of this article is available at https://hdl.handle.net/2268/293614.

Available on ORBi :

since 12 September 2022

Statistics

Number of views

194 (61 by ULiège)

Number of downloads

102 (14 by ULiège)

More statistics

See more details

Name	Provider / Domaine	Expiration	Description
JSESSIONID	Oracle Corporation www.uliege.be	Session	General purpose platform session cookie, used by sites written in JSP. Usually used to maintain an anonymous user session by the server.
CookieScriptConsent	CookieScript .uliege.be	1 year	This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.

Name	Provider / Domaine	Expiration	Description
_pk_id	InnoCraft Ltd .uliege.be	1 year	Used to store a few details about the user such as the unique visitor ID
_pk_ses	InnoCraft Ltd .uliege.be	30 minutes	Short lived cookies used to temporarily store data for the visit
_pk_ref	InnoCraft Ltd .uliege.be	6 months	Used to store the attribution information, the referrer initially used to visit the website