Optimal sample selection for batch-mode reinforcement learning

Rachelson, Emmanuel; Schnitzler, François; Wehenkel, Louis; Ernst, Damien

Download

Paper published in a book (Scientific congresses and symposiums)

Optimal sample selection for batch-mode reinforcement learning

Rachelson, Emmanuel; Schnitzler, François; Wehenkel, Louis et al.

2011 • In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence (ICAART 2011)

Peer reviewed

Permalink
https://hdl.handle.net/2268/83529

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

icaart-2011.pdf

Publisher postprint (794.26 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

stochastic optimal control; sample control; reinforcement learning

Abstract :

[en] We introduce the Optimal Sample Selection (OSS) meta-algorithm for solving discrete-time Optimal Control problems. This meta-algorithm maps the problem of ﬁnding a near-optimal closed-loop policy to the identiﬁcation of a small set of one-step system transitions, leading to high-quality policies when used as input of a batch-mode Reinforcement Learning (RL) algorithm. We detail a particular instance of this OSS metaalgorithm that uses tree-based Fitted Q-Iteration as a batch-mode RL algorithm and Cross Entropy search as a method for navigating efﬁciently in the space of sample sets. The results show that this particular instance of OSS algorithms is able to identify rapidly small sample sets leading to high-quality policies

Disciplines :

Computer science

Author, co-author :

Rachelson, Emmanuel

Schnitzler, François ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Wehenkel, Louis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

English

Title :

Optimal sample selection for batch-mode reinforcement learning

Publication date :

2011

Event name :

3rd International Conference on Agents and Artificial Intelligence (ICAART 2011)

Event place :

Rome, Italy

Event date :

28-30 January 2011

Audience :

International

Main work title :

Proceedings of the 3rd International Conference on Agents and Artificial Intelligence (ICAART 2011)

Peer review/Selection committee :

Peer reviewed

Funders :

F.R.S.-FNRS - Fonds de la Recherche Scientifique

Available on ORBi :

since 03 February 2011

Statistics

Number of views

293 (16 by ULiège)

Number of downloads

362 (5 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

Bertsekas, D. P. and Shreve, S. E. (1996). Stochastic Optimal Control: The Discrete-Time Case. Athena Scientific.
Boyan, J. (1999). Least-squares temporal difference learning. In Int. Conf. Machine Learning, pages 49-56.
Buşoniu, L., Babuska, R., Schutter, B. D., and Ernst, D. (2010). Reinforcement Learning and Dynamic Programming using Function Approximators. Taylor & Francis.
Buşoniu, L., Ernst, D., Schutter, B. D., and Babuška, R. (2008). Fuzzy partition optimization for approximate fuzzy Q-iteration. In IFAC World Congress.
Ernst, D. (2005). Selecting concise sets of samples for a reinforcement learning agent. In Int. Conf. on Computational Intelligence, Robotics and Autonomous Systems.
Ernst, D., Geurts, P., and Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503-556.
Ernst, D., Stan, G. B., Goncalves, J., and Wehenkel, L. (2006). Clinical data based optimal STI strategies for HIV: a reinforcement learning approach. In IEEE Conference on Decision and Control.
Kalyanakrishnan, S. and Stone, P. (2007). Batch reinforcement learning in a complex domain. In AAMAS, pages 650-657.
Kroese, D. P., Rubinstein, R. Y., and Porotsky, S. (2006). The cross-entropy method for continuous multiextremal optimization. Meth. and Comp. in App. Prob., 8:383-407.
Lagoudakis, M. and Parr, R. (2003a). Least-squares policy iteration. Journal of Machine Learning Research, 4:1107-1149.
Lagoudakis, M. G. and Parr, R. (2003b). Reinforcement learning as classification: Leveraging modern classifiers. In 20th Int. Conf. on Machine Learning, pages 424-431.
Menache, A., Mannor, S., and Shimkin, N. (2005). Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research, 134(1):215-238. (Pubitemid 40550047)
Neumann, G. and Peters, J. (2009). Fitted Q-iteration by advantage weighted regression. In Neural Information Processing Systems.
Ormoneit, D. and Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning Journal, 49:161-178.
Riedmiller, M. (2005). Neural fitted Q-iteration - first experiences with a data efficient neural reinforcement learning method. In 16th European Conference on Machine Learning, pages 317-328.
Rubinstein, R. Y. and Kroese, D. P. (2004). The Cross-Entropy Method: a Unified Approach to Monte Carlo Simulation, Randomized Optimization and Machine Learning. Springer Verlag.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. The MIT Press, Cambridge.
Szita, I. and Lörincz, A. (2006). Learning tetris using the noisy cross-entropy method. Neural Computation, 18(12):2936-2941. (Pubitemid 44879147)