Article (Scientific journals)
Tree-based batch mode reinforcement learning
Ernst, Damien; Geurts, Pierre; Wehenkel, Louis
2005In Journal of Machine Learning Research, 6, p. 503-556
Peer Reviewed verified by ORBi
 

Files


Full Text
ernst05a.pdf
Publisher postprint (1.32 MB)
The fitted Q iteration (FQI) algorithm was first described in the paper "Iteratively extending time horizon reinforcement learning" (see below) but this paper is the first one to name it fitted Q iteration (or FQI in short).
Download
Annexes
ernst-fittedQIteration.pdf
Publisher postprint (335.04 kB)
Presentation that gives a brief overview of our work on fitted Q iteration.
Download
ernst-icopi2005-slides.pdf
Publisher postprint (769.1 kB)
Presentation that discusses several strategies for using supervised learning in the context of batch-mode reinforcement learning.
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
fitted Q iteration; batch mode reinforcement learning; ensemble of regression trees; supervised learning; fitted value iteration; optimal control
Abstract :
[en] Reinforcement learning aims to determine an optimal control policy from interaction with a system or from observations gathered from a system. In batch mode, it can be achieved by approximating the so-called Q-function based on a set of four-tuples (x(t), u(t), r(t), x(t+1)) where x(t) denotes the system state at time t, u(t) the control action taken, r(t) the instantaneous reward obtained and x(t+1) the successor state of the system, and by determining the control policy from this Q-function. The Q-function approximation may be obtained from the limit of a sequence of (batch mode) supervised learning problems. Within this framework we describe the use of several classical tree-based supervised learning methods (CART, Kd-tree, tree bagging) and two newly proposed ensemble algorithms, namely extremely and totally randomized trees. We study their performances on several examples and find that the ensemble methods based on regression trees perform well in extracting relevant information about the optimal control policy from sets of four-tuples. In particular, the totally randomized trees give good results while ensuring the convergence of the sequence, whereas by relaxing the convergence constraint even better accuracy results are provided by the extremely randomized trees.
Disciplines :
Computer science
Author, co-author :
Ernst, Damien  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Geurts, Pierre  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Wehenkel, Louis  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Language :
English
Title :
Tree-based batch mode reinforcement learning
Publication date :
April 2005
Journal title :
Journal of Machine Learning Research
ISSN :
1532-4435
eISSN :
1533-7928
Publisher :
Microtome Publishing, Brookline, United States - Massachusetts
Volume :
6
Pages :
503-556
Peer reviewed :
Peer Reviewed verified by ORBi
Funders :
F.R.S.-FNRS - Fonds de la Recherche Scientifique
Available on ORBi :
since 22 March 2009

Statistics


Number of views
1217 (80 by ULiège)
Number of downloads
1274 (35 by ULiège)

Scopus citations®
 
796
Scopus citations®
without self-citations
756

Bibliography


Similar publications



Contact ORBi