Offline Policy-search in Bayesian Reinforcement Learning

[en] This thesis presents research contributions in the study field of Bayesian Reinforcement Learning — a subfield of Reinforcement Learning where, even though the dynamics of the system are un- known, the existence of some prior knowledge is assumed in the form of a distribution over Markov decision processes. In this thesis, two algorithms are presented: OPPS (Offline Prior- based Policy Search) and ANN-BRL (Artificial Neural Networks for Bayesian Reinforcement Learning), whose philosophy consists to analyse and exploit the knowledge available beforehand prior to interacting with the system(s), and which differ by the nature of the model they make use of. The former makes use of formula-based agents introduced by Maes et al. in (Maes, Wehenkel, and Ernst, 2012), while the latter relies on Artificial Neural Networks built via SAMME (Stagewise Additive Modelling using a Multi-class Exponential loss function) — an AdaBoost algorithm developed by Zhu et al. in (Zhu et al., 2009). Moreover, we also describe a comprehensive benchmark which has been created to compare Bayesian Reinforcement Learning algo- rithms. In real life applications, the choice of the best agent to fulfil a given task depends not only on their performances, but also on the computation times required to deploy them. This benchmark has been designed to identify the best algorithms by taking both criteria into account, and resulted in the development of an open-source library: BBRL (Benchmarking tools for Bayesian Reinforcement Learning) (https://github.com/mcastron/BBRL/wiki).
[fr] Cette dissertation présente diverses contributions scientifiques dans le domaine de l’apprentissage par renforcement Bayésien, dans lequel les dynamiques du système sont inconnues et pour lequelles nous disposons de connaissances a priori, existant sous la forme d’une distribution sur un ensemble de processus décisionnels Markoviens. Nous présentons tout d’abord deux algorithmes, OPPS (Offline Prior-based Policy Search — recherche directe de politique hors-ligne) et ANN-BRL (Artificial Neural Networks for Bayesian Reinforcement Learning — réseaux de neurones artificiels pour l’apprentissage par renforcement Bayésien), dont la philosophie repose sur l’analyse et l’exploitation de ces connaissances a priori avant de commencer à intéragir avec le(s) système(s). Ces méthodes diffèrent par la nature de leur modèle. La première utilise des agents à base de formule introduits par Maes et al. dans (Maes, Wehenkel, and Ernst, 2012), tandis que la seconde repose sur l’utilisation de réseaux de neurones artificiels construits grâce à SAMME (Stagewise Additive Modeling using a Multi-class Exponential loss function — modélisation additive par cycle basée sur une fonction de perte exponentielle multi-classe), un algorithme d’adaboosting développé par Zhu et al. dans (Zhu et al., 2009), Nous décrivons également un protocole expérimental que nous avons conçu afin de comparer les algorithmes d’apprentissage par renforcement Bayésien entre eux. Dans le cadre d’applications réelles, le choix du meilleur agent pour traiter une tâche spécifique dépend non seulement des ses performances, mais également des temps de calculs nécessaires pour le déployer. Ce protocole expérimental per- met de déterminer quel est le meilleur algorithme pour résoudre une tâche donnée en tenant compte de ces deux critères. Ce dernier a été mis à la disposition de la communauté scientifique sous la forme d’une bibliothèque logicielle libre : BBRL (Benchmarking tools for Bayesian Reinforcement Learning — outils de comparaison pour l’apprentissage par renforcement Bayésien) (https://github.com/mcastron/BBRL/wiki).

Disciplines :

Computer science

Author, co-author :

Castronovo, Michaël ; Université de Liège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids

Language :

English

Title :

Offline Policy-search in Bayesian Reinforcement Learning

Alternative titles :

[fr] Recherche directe de politique hors-ligne en apprentissage par renforcement Bayésien

Defense date :

15 March 2017

Number of pages :

115

Institution :

ULiège - Université de Liège, Liège, Belgium

Degree :

Docteur en sciences, orientation informatique

Promotor :

Ernst, Damien ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

President :

Wehenkel, Louis ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Jury member :

Van Droogenbroeck, Marc ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Fonteneau, Raphaël ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Nowé, Ann

Lazaric, Alessandro

Tags :

CÉCI : Consortium des Équipements de Calcul Intensif

Funders :

F.R.S.-FNRS - Fonds de la Recherche Scientifique

Available on ORBi :

since 20 March 2017

Statistics

Number of views

288 (37 by ULiège)

Number of downloads

413 (19 by ULiège)

More statistics

See more details

Bibliography

Similar publications

Sorry the service is unavailable at the moment. Please try again later.

Name	Provider / Domaine	Expiration	Description
JSESSIONID	Oracle Corporation www.uliege.be	Session	General purpose platform session cookie, used by sites written in JSP. Usually used to maintain an anonymous user session by the server.
CookieScriptConsent	CookieScript .uliege.be	1 year	This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.

Name	Provider / Domaine	Expiration	Description
_pk_id	InnoCraft Ltd .uliege.be	1 year	Used to store a few details about the user such as the unique visitor ID
_pk_ses	InnoCraft Ltd .uliege.be	30 minutes	Short lived cookies used to temporarily store data for the visit
_pk_ref	InnoCraft Ltd .uliege.be	6 months	Used to store the attribution information, the referrer initially used to visit the website