Using prior knowledge to accelerate online least-squares policy iteration

Busoniu, Lucian; De Schutter, Bart; Babuska, Robert; Ernst, Damien

doi:10.1109/AQTR.2010.5520917

Download

Paper published in a book (Scientific congresses and symposiums)

Using prior knowledge to accelerate online least-squares policy iteration

Busoniu, Lucian; De Schutter, Bart; Babuska, Robert et al.

2010 • In Proceedings of the 2010 IEEE International Conference on Automation, Quality and Testing, Robotics

Peer reviewed

Permalink
https://hdl.handle.net/2268/60303

DOI
10.1109/AQTR.2010.5520917

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

onlineLSPI-priorknowledge2010.pdf

Publisher postprint (957.05 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Reinforcement learning; on-line LSPI; prior knowledge

Abstract :

[en] Reinforcement learning (RL) is a promising paradigm for learning optimal control. Although RL is generally envisioned as working without any prior knowledge about the system, such knowledge is often available and can be exploited to great advantage. In this paper, we consider prior knowledge about the monotonicity of the control policy with respect to the system states, and we introduce an approach that exploits this type of prior knowledge to accelerate a state-of-the-art RL algorithm called online least-squares policy iteration (LSPI). Monotonic policies are appropriate for important classes of systems appearing in control applications. LSPI is a data-efficient RL algorithm that we previously extended to online learning, but that did not provide until now a way to use prior knowledge about the policy. In an empirical evaluation, online LSPI with prior knowledge learns much faster and more reliably than the original online LSPI.

Disciplines :

Computer science

Author, co-author :

Busoniu, Lucian

De Schutter, Bart

Babuska, Robert

Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

English

Title :

Using prior knowledge to accelerate online least-squares policy iteration

Publication date :

May 2010

Event name :

2010 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR 2010)

Event place :

Cluj-Napoca, Romania

Event date :

28-30 May 2010

Audience :

International

Main work title :

Proceedings of the 2010 IEEE International Conference on Automation, Quality and Testing, Robotics

ISBN/EAN :

978-1-4244-6724-2

Peer review/Selection committee :

Peer reviewed

Available on ORBi :

since 02 June 2010

Statistics

Number of views

144 (3 by ULiège)

Number of downloads

277 (3 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Athena Scientific, 1996.
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT Press, 1998.
D. P. Bertsekas, Dynamic Programming and Optimal Control, 3rd ed. Athena Scientific, 2007, vol. 2.
J. Boyan, "Technical update: Least-squares temporal difference learning, " Machine Learning, vol. 49, pp. 233-246, 2002.
A. Nedić and D. P. Bertsekas, "Least-squares policy evaluation algorithms with linear function approximation, " Discrete Event Dynamic Systems: Theory and Applications, vol. 13, no. 1-2, pp. 79-110, 2003.
M. G. Lagoudakis and R. Parr, "Least-squares policy iteration, " Journal of Machine Learning Research, vol. 4, pp. 1107-1149, 2003.
D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning, " Journal ofMachine Learning Research, vol. 6, pp. 503-556, 2005.
H. Yu and D. P. Bertsekas, "Convergence results for some temporal difference methods based on least squares, " IEEE Transactions on Automatic Control, vol. 54, no. 7, pp. 1515-1531, 2009.
L. Buşoniu, D. Ernst, B. De Schutter, and R. Babǔska, "Online least-squares policy iteration for reinforcement learning control, " in Proceedings 2010 American Control Conference (ACC-10), Baltimore, US, 30 June - 2 July 2010, accepted for publication.
R. S. Sutton, "Learning to predict by the method of temporal differences, " Machine Learning, vol. 3, pp. 9-44, 1988.
L. Li, M. L. Littman, and C. R. Mansley, "Online exploration in least-squares policy iteration, " in Proceedings 8th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS- 09), vol. 2, Budapest, Hungary, 10-15 May 2009, pp. 733-739.
T. Jung and D. Polani, "Kernelizing LSPE(λ?), " in Proceedings 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL-07), Honolulu, US, 1-5 April 2007, pp. 338-345.