Aggregating Optimistic Planning Trees for Solving Markov Decision Processes

Kedenburg, Gunnar; Fonteneau, Raphaël; Munos, Rémi

Download

Paper published in a book (Scientific congresses and symposiums)

Aggregating Optimistic Planning Trees for Solving Markov Decision Processes

Kedenburg, Gunnar; Fonteneau, Raphaël; Munos, Rémi

2013 • In Advances in Neural Information Processing Systems 26 (2013)

Peer reviewed

Permalink
https://hdl.handle.net/2268/161574

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

nips2013.pdf

Publisher postprint (276.47 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Reinforcement Learning; Markov Decision Processes; On-line Planning

Abstract :

[en] This paper addresses the problem of online planning in Markov decision processes using a randomized simulator, under a budget constraint. We propose a new algorithm which is based on the construction of a forest of planning trees, where each tree corresponds to a random realization of the stochastic environment. The trees are constructed using a “safe” optimistic planning strategy combining the optimistic principle (in order to explore the most promising part of the search space first) with a safety principle (which guarantees a certain amount of uniform exploration). In the decision-making step of the algorithm, the individual trees are aggregated and an immediate action is recommended. We provide a finite-sample analysis and discuss the trade-off between the principles of optimism and safety. We also report numerical results on a benchmark problem. Our algorithm performs as well as state-of-the-art optimistic planning algorithms, and better than a related algorithm which additionally assumes the knowledge of all transition distributions.

Disciplines :

Computer science

Author, co-author :

Kedenburg, Gunnar

Fonteneau, Raphaël ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Munos, Rémi; Inria Lille - Nord Europe

Language :

English

Title :

Aggregating Optimistic Planning Trees for Solving Markov Decision Processes

Publication date :

2013

Event name :

Neural Information Processing Systems 26 (2013)

Event place :

Lake Tahoe, United States

Event date :

December 5-10, 2013

Audience :

International

Main work title :

Advances in Neural Information Processing Systems 26 (2013)

Pages :

2382-2390

Peer review/Selection committee :

Peer reviewed

Available on ORBi :

since 20 January 2014

Statistics

Number of views

94 (0 by ULiège)

Number of downloads

35 (0 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite time analysis of multiarmed bandit problems. Machine Learning, 47:235-256, 2002.
L. Busoniu and R. Munos. Optimistic planning for Markov decision processes. In International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR W & CP 22, pages 182-189, 2012.
E. F. Camacho and C. Bordons. Model Predictive Control. Springer, 2004.
R. Coulom. Efficient selectivity and backup operators in Monte-Carlo tree search. Computers and Games, pages 72-83, 2007.
B. Defourny, D. Ernst, and L. Wehenkel. Lazy planning under uncertainty by optimizing decisions on an ensemble of incomplete disturbance trees. In Recent Advances in Reinforcement Learning - European Workshop on Reinforcement Learning (EWRL), pages 1-14, 2008.
R. Fonteneau, L. Busoniu, and R. Munos. Optimistic planning for belief-augmented Markov decision processes. In IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2013.
S. Gelly, Y. Wang, R. Munos, and O. Teytaud. Modification of UCT with patterns in Monte- Carlo go. Technical report, INRIA RR-6062, 2006.
P. E. Hart, N. J. Nilsson, and B. Raphael. A formal basis for the heuristic determination of minimum cost paths. Systems Science and Cybernetics, IEEE Transactions on, 4(2):100-107, 1968.
J. F. Hren and R. Munos. Optimistic planning of deterministic systems. Recent Advances in Reinforcement Learning, pages 151-164, 2008.
J. E. Ingersoll. Theory of Financial Decision Making. Rowman and Littlefield Publishers, Inc., 1987.
M. Kearns, Y. Mansour, and A. Y. Ng. A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning, 49(2-3):193-208, 2002.
L. Kocsis and C. Szepesvári. Bandit based Monte-Carlo planning. Machine Learning: ECML 2006, pages 282-293, 2006.
R. Munos. From bandits to Monte-Carlo Tree Search: The optimistic principle applied to optimization and planning. To appear in Foundations and Trends in Machine Learning, 2013.
S. A. Murphy. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society, Series B, 65(2):331-366, 2003.
J. Peters, S. Vijayakumar, and S. Schaal. Reinforcement learning for humanoid robotics. In IEEE-RAS International Conference on Humanoid Robots, pages 1-20, 2003.
R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, 1998.
T. J.Walsh, S. Goschin, and M. L. Littman. Integrating sample-based planning and model-based reinforcement learning. In AAAI Conference on Artificial Intelligence, 2010.