Iteratively extending time horizon reinforcement learning

[en] Reinforcement learning aims to determine an (infinite time horizon) optimal control policy from interaction with a system. It can be solved by approximating the so-called Q-function from a sample of four-tuples (x(t), u(t), r(t), x(t+1)) where x(t) denotes the system state at time t, ut the control action taken, rt the instantaneous reward obtained and x(t+1) the successor state of the system, and by determining the optimal control from the Q-function. Classical reinforcement learning algorithms use an ad hoc version of stochastic approximation which iterates over the Q-function approximations on a four-tuple by four-tuple basis. In this paper, we reformulate this problem as a sequence of batch mode supervised learning problems which in the limit converges to (an approximation of) the Q-function. Each step of this algorithm uses the full sample of four-tuples gathered from interaction with the system and extends by one step the horizon of the optimality criterion. An advantage of this approach is to allow the use of standard batch mode supervised learning algorithms, instead of the incremental versions used up to now. In addition to a theoretical justification the paper provides empirical tests in the context of the "Car on the Hill" control problem based on the use of ensembles of regression trees. The resulting algorithm is in principle able to handle efficiently large scale reinforcement learning problems.

Disciplines :

Computer science

Author, co-author :

Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Geurts, Pierre ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Wehenkel, Louis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

English

Title :

Iteratively extending time horizon reinforcement learning

Publication date :

2003

Event name :

14th European Conference on Machine Learning (ECML 2003)

Audience :

International

Main work title :

Machine Learning: ECML 2003, 14th European Conference on Machine Learning

Publisher :

Springer-Verlag Berlin, Berlin, Germany

ISBN/EAN :

978-3-540-20121-2

Collection name :

Lecture Notes in Articial Intelligence, Volume 2837

Pages :

96-107

Peer reviewed :

Peer reviewed

Additional URL :

http://www.montefiore.ulg.ac.be/~ernst/

Available on ORBi :

since 22 March 2009

Statistics

Number of views

140 (10 by ULiège)

Number of downloads

427 (15 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

publications

supporting

mentioning

contrasting

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Bibliography

D. Bertsekas. Dynamic Programming and Optimal Control, volume I. Athena Scientific, Belmont, MA, 2nd edition, 2000.
L. Breiman. Bagging predictors. Machine Learning, 24(2): 123-140, 1996.
L. Breiman. Random forests. Machine Learning, 45:5-32, 2001.
L. Breiman, J. Friedman, R. Olsen, and C. Stone. Classification and Regression Trees. Wadsworth International (California), 1984.
D. Ernst. Near optimal closed-loop control. Application to electric power systems. PhD thesis, University of Liège, Belgium, March 2003.
P. Geurts. Contributions to decision tree induction: bias/variance tradeoff and time series classification. PhD thesis, University of Liège, Belgium, May 2002.
P. Geurts. Extremely randomized trees. Technical report, University of Liège, 2003.
A. Moore and C. Atkeson. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Real Time. Machine Learning, 13:103-130, 1993.
M. T. Rosenstein and A. G. Barto. Supervised learning combined with an actor-critic architecture. Technical report, University of Massachusetts, Department of Computer Science, 2002.
W. Smart and L. Kaelbling. Practical Reinforcement Learning in Continuous Spaces. In Proceedings of the Sixteenth International Conference on Machine Learning, 2000.
C. Watkins and P. Dayan. Q-learning. Machine learning, 8:279-292, 1992.

Similar publications

Sorry the service is unavailable at the moment. Please try again later.

Name	Provider / Domaine	Expiration	Description
JSESSIONID	Oracle Corporation www.uliege.be	Session	General purpose platform session cookie, used by sites written in JSP. Usually used to maintain an anonymous user session by the server.
CookieScriptConsent	CookieScript .uliege.be	1 year	This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.

Name	Provider / Domaine	Expiration	Description
_pk_id	InnoCraft Ltd .uliege.be	1 year	Used to store a few details about the user such as the unique visitor ID
_pk_ses	InnoCraft Ltd .uliege.be	30 minutes	Short lived cookies used to temporarily store data for the visit
_pk_ref	InnoCraft Ltd .uliege.be	6 months	Used to store the attribution information, the referrer initially used to visit the website