Bayes Adaptive Reinforcement Learning versus Off-line Prior-based Policy Search: an Empirical Comparison

Castronovo, Michaël; Ernst, Damien; Fonteneau, Raphaël

Download

Paper published in a book (Scientific congresses and symposiums)

Bayes Adaptive Reinforcement Learning versus Off-line Prior-based Policy Search: an Empirical Comparison

Castronovo, Michaël; Ernst, Damien; Fonteneau, Raphaël

2014 • In Proceedings of the 23rd annual machine learning conference of Belgium and the Netherlands (BENELEARN 2014)

Peer reviewed

Permalink
https://hdl.handle.net/2268/166829

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

BAMCP_vs_OPPS_v3.pdf

Author postprint (393.06 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Reinforcement Learning

Abstract :

[en] This paper addresses the problem of decision making in unknown finite Markov decision processes (MDPs). The uncertainty about the MDPs is modeled using a prior distribution over a set of candidate MDPs. The performance criterion is the expected sum of discounted rewards collected over an infinite length trajectory. Time constraints are defined as follows: (i) an off-line phase with a given time budget can be used to exploit the prior distribution and (ii) at every time step of the on-line phase, decisions have to be computed within a given time budget. In this setting, we compare two decision-making strategies: OPPS, a recently proposed meta-learning scheme which mainly exploits the off-line phase to perform policy search and BAMCP, a state-of-the-art model-based Bayesian reinforcement learning algorithm, which mainly exploits the on-line time budget. We empirically compare these approaches in a real Bayesian setting by computing their performances over a large set of problems. To the best of our knowledge, it is the first time that this is done in the reinforcement learning literature. Several settings are considered by varying the prior distribution and the distribution from which test problems are drawn. The main finding of these experiments is that there may be a significant benefit of having an off-line prior-based optimization phase in the case of informative and accurate priors, especially when on-line time constraints are tight.

Disciplines :

Computer science

Author, co-author :

Castronovo, Michaël ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids

Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids

Fonteneau, Raphaël ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

English

Title :

Bayes Adaptive Reinforcement Learning versus Off-line Prior-based Policy Search: an Empirical Comparison

Alternative titles :

[fr] Apprentissage par renforcement bayésien versus recherche directe de politique hors-ligne en utilisant une distribution a priori: comparaison empirique

Publication date :

June 2014

Event name :

23rd annual machine learning conference of Belgium and the Netherlands (BENELEARN 2014)

Event place :

Brussels, Belgium

Event date :

June 2014

Audience :

International

Main work title :

Proceedings of the 23rd annual machine learning conference of Belgium and the Netherlands (BENELEARN 2014)

Peer reviewed :

Peer reviewed

Tags :

CÉCI : Consortium des Équipements de Calcul Intensif

Funders :

F.R.S.-FNRS - Fonds de la Recherche Scientifique

Available on ORBi :

since 07 May 2014

Statistics

Number of views

380 (87 by ULiège)

Number of downloads

240 (30 by ULiège)

More statistics