Paper published in a book (Scientific congresses and symposiums)
Iteratively extending time horizon reinforcement learning
Ernst, Damien; Geurts, Pierre; Wehenkel, Louis
2003In Machine Learning: ECML 2003, 14th European Conference on Machine Learning
Peer reviewed
 

Files


Full Text
fulltext.pdf
Publisher postprint (554.06 kB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
reinforcement learning; regression trees; Q-function
Abstract :
[en] Reinforcement learning aims to determine an (infinite time horizon) optimal control policy from interaction with a system. It can be solved by approximating the so-called Q-function from a sample of four-tuples (x(t), u(t), r(t), x(t+1)) where x(t) denotes the system state at time t, ut the control action taken, rt the instantaneous reward obtained and x(t+1) the successor state of the system, and by determining the optimal control from the Q-function. Classical reinforcement learning algorithms use an ad hoc version of stochastic approximation which iterates over the Q-function approximations on a four-tuple by four-tuple basis. In this paper, we reformulate this problem as a sequence of batch mode supervised learning problems which in the limit converges to (an approximation of) the Q-function. Each step of this algorithm uses the full sample of four-tuples gathered from interaction with the system and extends by one step the horizon of the optimality criterion. An advantage of this approach is to allow the use of standard batch mode supervised learning algorithms, instead of the incremental versions used up to now. In addition to a theoretical justification the paper provides empirical tests in the context of the "Car on the Hill" control problem based on the use of ensembles of regression trees. The resulting algorithm is in principle able to handle efficiently large scale reinforcement learning problems.
Disciplines :
Computer science
Author, co-author :
Ernst, Damien  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Geurts, Pierre  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Wehenkel, Louis  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Language :
English
Title :
Iteratively extending time horizon reinforcement learning
Publication date :
2003
Event name :
14th European Conference on Machine Learning (ECML 2003)
Audience :
International
Main work title :
Machine Learning: ECML 2003, 14th European Conference on Machine Learning
Publisher :
Springer-Verlag Berlin, Berlin, Germany
ISBN/EAN :
978-3-540-20121-2
Collection name :
Lecture Notes in Articial Intelligence, Volume 2837
Pages :
96-107
Peer reviewed :
Peer reviewed
Available on ORBi :
since 22 March 2009

Statistics


Number of views
130 (10 by ULiège)
Number of downloads
406 (13 by ULiège)

Scopus citations®
 
17
Scopus citations®
without self-citations
13
OpenCitations
 
6
OpenAlex citations
 
24

Bibliography


Similar publications



Contact ORBi