Approximate Bayes Optimal Policy Search using Neural Networks

Castronovo, Michaël; François-Lavet, Vincent; Fonteneau, Raphaël; Ernst, Damien; Couëtoux, Adrien

doi:10.5220/0006191701420153

Download

Paper published in a book (Scientific congresses and symposiums)

Approximate Bayes Optimal Policy Search using Neural Networks

Castronovo, Michaël; François-Lavet, Vincent; Fonteneau, Raphaël et al.

2017 • In Proceedings of the 9th International Conference on Agents and Artificial Intelligence (ICAART 2017)

Peer reviewed

Permalink
https://hdl.handle.net/2268/204410

DOI
10.5220/0006191701420153

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

ANN-BRL_final.pdf

Publisher postprint (300.99 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Bayesian reinforcement learning; artificial neural networks; offline policy search

Abstract :

[en] Bayesian Reinforcement Learning (BRL) agents aim to maximise the expected collected rewards obtained when interacting with an unknown Markov Decision Process (MDP) while using some prior knowledge. State-of-the-art BRL agents rely on frequent updates of the belief on the MDP, as new observations of the environment are made. This offers theoretical guarantees to converge to an optimum, but is computationally intractable, even on small-scale problems. In this paper, we present a method that circumvents this issue by training a parametric policy able to recommend an action directly from raw observations. Artificial Neural Networks (ANNs) are used to represent this policy, and are trained on the trajectories sampled from the prior. The trained model is then used online, and is able to act on the real MDP at a very low computational cost. Our new algorithm shows strong empirical performance, on a wide range of test problems, and is robust to inaccuracies of the prior distribution.

Disciplines :

Computer science

Author, co-author :

Castronovo, Michaël ; Université de Liège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids

François-Lavet, Vincent ; Université de Liège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Dép. d'électric., électron. et informat. (Inst.Montefiore)

Fonteneau, Raphaël ; Université de Liège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Dép. d'électric., électron. et informat. (Inst.Montefiore)

Ernst, Damien ; Université de Liège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids

Couëtoux, Adrien ; Université de Liège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids

Language :

English

Title :

Approximate Bayes Optimal Policy Search using Neural Networks

Publication date :

February 2017

Event name :

9th International Conference on Agents and Artificial Intelligence (ICAART 2017)

Event place :

Porto, Portugal

Event date :

du 24 février 2017 au 26 février 2017

Audience :

International

Main work title :

Proceedings of the 9th International Conference on Agents and Artificial Intelligence (ICAART 2017)

Peer review/Selection committee :

Peer reviewed

Tags :

CÉCI : Consortium des Équipements de Calcul Intensif

Funders :

F.R.S.-FNRS - Fonds de la Recherche Scientifique

Available on ORBi :

since 16 December 2016

Statistics

Number of views

877 (45 by ULiège)

Number of downloads

1077 (24 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenAlex citations

Bibliography

Asmuth, J., Li, L., Littman, M., Nouri, A., and Wingate, D. (2009). A Bayesian sampling approach to exploration in Reinforcement Learning. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI), pages 19-26. AUAI Press.
Asmuth, J. and Littman, M. (2011). Approaching Bayesoptimalilty using Monte-Carlo tree search. In Proceedings of the 21st International Conference on Automated Planning and Scheduling.
Castro, P. S. and Precup, D. (2010). Smarter sampling in model-based bayesian reinforcement learning. In Machine Learning and Knowledge Discovery in Databases, pages 200-214. Springer.
Castronovo, M., Ernst, D., Couetoux, A., and Fonteneau, R. (2015). Benchmarking for Bayesian Reinforcement Learning. Submitted.
Castronovo, M., Fonteneau, R., and Ernst, D. (2014). Bayes Adaptive Reinforcement Learning versus Offline Prior-based Policy Search: An Empirical Comparison. 23rd annual machine learning conference of Belgium and the Netherlands (BENELEARN 2014), pages 1-9.
Castronovo, M., Maes, F., Fonteneau, R., and Ernst, D. (2012). Learning exploration/exploitation strategies for single trajectory Reinforcement Learning. Journal of Machine Learning Research (JMLR), pages 1-9.
Dearden, R., Friedman, N., and Russell, S. (1998). Bayesian Q-learning. In Proceedings of Fifteenth National Conference on Artificial Intelligence (AAAI), pages 761-768. AAAI Press.
Duff, M. O. (2002). Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massachusetts Amherst.
Fonteneau, R., Busoniu, L., and Munos, R. (2013). Optimistic planning for belief-augmented markov decision processes. In Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2013 IEEE Symposium on, pages 77-84. IEEE.
Guez, A., Silver, D., and Dayan, P. (2012). Efficient Bayesadaptive Reinforcement Learning using sample-based search. In Neural Information Processing Systems (NIPS).
Guez, A., Silver, D., and Dayan, P. (2013). Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search. Journal of Artificial Intelligence Research, pages 841-883.
Kaelbling, L., Littman, M., and Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(12):99 - 134.
Kearns, M., Mansour, Y., and Ng, A. Y. (2002). A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning, 49(2-3):193-208.
Kocsis, L. and Szepesvári, C. (2006). Bandit based Monte-Carlo planning. European Conference on Machine Learning (ECML), pages 282-293.
Kolter, J. Z. and Ng, A. Y. (2009a). Near-Bayesian exploration in polynomial time. In Proceedings of the 26th Annual International Conference on Machine Learning.
Kolter, J. Z. and Ng, A. Y. (2009b). Near-bayesian exploration in polynomial time. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 513-520. ACM.
Martin, J. J. (1967). Bayesian decision problems and markov chains. "Originally submitted as a Ph.D. thesis [Massachusetts Institute of Technology, 1965]".
Schwenk, H. and Bengio, Y. (2000). Boosting Neural Networks. Neural Comp., 12(8):1869-1887.
Silver, E. A. (1963). Markovian decision processes with uncertain transition probabilities or rewards. Technical report, DTIC Document.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement learning: An introduction, volume 1. MIT press Cambridge.
Walsh, T. J., Goschin, S., and Littman, M. L. (2010). Integrating sample-based planning and model-based reinforcement learning. In AAAI.
Wang, Y., Won, K. S., Hsu, D., and Lee, W. S. (2012). Monte carlo bayesian reinforcement learning. arXiv preprint arXiv:1206.6449.
Zhang, T., Kahn, G., Levine, S., and Abbeel, P. (2015). Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. CoRR, abs/1509.06791.
Zhu, J., Zou, H., Rosset, S., and Hastie, T. (2009). Multiclass adaboost. Statistics and its Interface, 2(3):349-360.