Batch mode reinforcement learning based on the synthesis of artificial trajectories

Fonteneau, Raphaël; Murphy, Susan A.; Wehenkel, Louis; Ernst, Damien

doi:10.1007/s10479-012-1248-5

Download

Article (Scientific journals)

Batch mode reinforcement learning based on the synthesis of artificial trajectories

Fonteneau, Raphaël; Murphy, Susan A.; Wehenkel, Louis et al.

2013 • In Annals of Operations Research, 208 (1), p. 383-416

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/2268/154960

DOI
10.1007/s10479-012-1248-5

PubMed
24049244

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

ANOR.pdf

Author preprint (2.37 MB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Reinforcement learning; Optimal control; Artificial trajectories

Disciplines :

Computer science

Author, co-author :

Fonteneau, Raphaël ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Murphy, Susan A.

Wehenkel, Louis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids

Language :

English

Title :

Batch mode reinforcement learning based on the synthesis of artificial trajectories

Publication date :

September 2013

Journal title :

Annals of Operations Research

ISSN :

0254-5330

eISSN :

1572-9338

Publisher :

Springer, Norwell, United States - Massachusetts

Volume :

208

Issue :

Pages :

383-416

Peer reviewed :

Peer Reviewed verified by ORBi

Available on ORBi :

since 27 August 2013

Statistics

Number of views

797 (178 by ULiège)

Number of downloads

583 (90 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

See more details

publications

supporting

mentioning

contrasting

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Bibliography

Antos, A., Munos, R., & Szepesvári, C. (2007). Fitted Q-iteration in continuous action space MDPs. In Advances in neural information processing systems (NIPS) (Vol. 20).
Bellman, R. (1957). Dynamic programming. Princeton: Princeton University Press.
Bonarini, A., Caccia, C., Lazaric, A., & Restelli, M. (2008). Batch reinforcement learning for controlling a mobile wheeled pendulum robot. In Artificial intelligence in theory and practice II (pp. 151-160).
Boyan, J. (2005). Technical update: least-squares temporal difference learning. Machine Learning, 49, 233-246.
Boyan, J., & Moore, A. (1995). Generalization in reinforcement learning: safely approximating the value function. In Advances in neural information processing systems (NIPS) (Vol. 7, pp. 369-376). Denver: MIT Press.
Bradtke, S., & Barto, A. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22, 33-57.
Busoniu, L., Babuska, R., De Schutter, B., & Ernst, D. (2010). Reinforcement learning and dynamic programming using function approximators. London: Taylor & Francis/CRC Press.
Castelletti, A., de Rigo, D., Rizzoli, A., Soncini-Sessa, R., & Weber, E. (2007). Neuro-dynamic programming for designing water reservoir network management policies. Control Engineering Practice, 15(8), 1031-1038.
Castelletti, A., Galelli, S., Restelli, M., & Soncini-Sessa, R. (2010). Tree-based reinforcement learning for optimal water reservoir operation. Water Resources Research, 46, W09507.
Chakraborty, B., Strecher, V., & Murphy, S. (2008). Bias correction and confidence intervals for fitted Q-iteration. In Workshop on model uncertainty and risk in reinforcement learning (NIPS), Whistler, Canada.
Defourny, B., Ernst, D., & Wehenkel, L. (2008). Risk-aware decision making and dynamic programming. In Workshop on model uncertainty and risk in reinforcement learning (NIPS), Whistler, Canada.
Ernst, D., Geurts, P., & Wehenkel, L. (2003). Iteratively extending time horizon reinforcement learning. In European conference on machine learning (ECML) (pp. 96-107).
Ernst, D., Geurts, P., & Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6, 503-556.
Ernst, D., Marée, R., & Wehenkel, L. (2006a). Reinforcement learning with raw image pixels as state input (IWICPAS). In Lecture notes in computer science: Vol. 4153. International workshop on intelligent computing in pattern analysis/synthesis (pp. 446-454).
Ernst, D., Stan, G., Goncalves, J., & Wehenkel, L. (2006b). Clinical data based optimal STI strategies for HIV: a reinforcement learning approach. In Machine learning conference of Belgium and the Netherlands (BeNeLearn) (pp. 65-72).
Ernst, D., Glavic, M., Capitanescu, F., & Wehenkel, L. (2009). Reinforcement learning versus model predictive control: a comparison on a power system problem. IEEE Transactions on Systems, Man and Cybernetics. Part B. Cybernetics, 39, 517-529.
Farahmand, A., Ghavamzadeh, M., Szepesvári, C., & Mannor, S. (2008). Regularized fitted q-iteration: application to planning. In S. Girgin, M. Loth, R. Munos, P. Preux, & D. Ryabko (Eds.), Lecture notes in computer science: Vol. 5323. Recent advances in reinforcement learning (pp. 55-68). Berlin/Heidelberg: Springer.
Fonteneau, R. (2011). Contributions to batch mode reinforcement learning. Ph. D. thesis, University of Liège.
Fonteneau, R., Murphy, S., Wehenkel, L., & Ernst, D. (2009). Inferring bounds on the performance of a control policy from a sample of trajectories. In IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), Nashville, TN, USA.
Fonteneau, R., Murphy, S., Wehenkel, L., & Ernst, D. (2010a). A cautious approach to generalization in reinforcement learning. In Second international conference on agents and artificial intelligence (ICAART), Valencia, Spain.
Fonteneau, R., Murphy, S., Wehenkel, L., & Ernst, D. (2010b). Generating informative trajectories by using bounds on the return of control policies. In Workshop on active learning and experimental design 2010 (in conjunction with AISTATS 2010).
Fonteneau, R., Murphy, S., Wehenkel, L., & Ernst, D. (2010c). Model-free Monte Carlo-like policy evaluation. In JMLR: W&CP: Vol. 9. Thirteenth international conference on artificial intelligence and statistics (AISTATS) (pp. 217-224). Laguna: Chia.
Fonteneau, R., Murphy, S. A., Wehenkel, L., & Ernst, D. (2010d). Towards min max generalization in reinforcement learning. In Communications in computer and information science (CCIS): Vol. 129. Revised selected papers. agents and artificial intelligence: international conference (ICAART 2010), Valencia, Spain (pp. 61-77). Heidelberg: Springer.
Gordon, G. (1995). Stable function approximation in dynamic programming. In Twelfth international conference on machine learning (ICML) (pp. 261-268).
Gordon, G. (1999). Approximate solutions to Markov decision processes. Ph. D. thesis, Carnegie Mellon University.
Guez, A., Vincent, R., Avoli, M., & Pineau, J. (2008). Adaptive treatment of epilepsy via batch-mode reinforcement learning. In Innovative applications of artificial intelligence (IAAI).
Lagoudakis, M., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107-1149.
Lange, S., & Riedmiller, M. (2010). Deep learning of visual control policies. In European symposium on artificial neural networks, computational intelligence and machine learning (ESANN), Brugge, Belgium.
Lazaric, A., Ghavamzadeh, M., & Munos, R. (2010a). Finite-sample analysis of least-squares policy iteration (Tech. Rep.). SEQUEL (INRIA) Lille-Nord Europe.
Lazaric, A., Ghavamzadeh, M., & Munos, R. (2010b). Finite-sample analysis of LSTD. In International conference on machine learning (ICML) (pp. 615-622).
Morimura, T., Sugiyama, M., Kashima, H., Hachiya, H., & Tanaka, T. (2010a). Nonparametric return density estimation for reinforcement learning. In 27th international conference on machine learning (ICML), Haifa, Israel, June 21-25.
Morimura, T., Sugiyama, M., Kashima, H., Hachiya, H., & Tanaka, T. (2010b). Parametric return density estimation for reinforcement learning. In 26th conference on uncertainty in artificial intelligence (UAI), Catalina Island, California, USA, Jul. 8-11 (pp. 368-375).
Munos, R., & Szepesvári, C. (2008). Finite-time bounds for fitted value iteration. Journal of Machine Learning Research, 9, 815-857.
Murphy, S. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statistical Society. Series B, 65(2), 331-366.
Murphy, S., van Der Laan, M., & Robins, J. (2001). Marginal mean models for dynamic regimes. Journal of the American Statistical Association, 96(456), 1410-1423.
Nedi, A., & Bertsekas, D. P. (2003). Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems, 13, 79-110. doi: 10. 1023/A: 1022192903948.
Ormoneit, D., & Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning, 49(2-3), 161-178.
Peters, J., Vijayakumar, S., & Schaal, S. (2003). Reinforcement learning for humanoid robotics. In Third IEEE-RAS international conference on humanoid robots (pp. 1-20). Citeseer.
Pietquin, O., Tango, F., & Aras, R. (2011). Batch reinforcement learning for optimizing longitudinal driving assistance strategies. In Computational intelligence in vehicles and transportation systems (CIVTS), 2011 IEEE Symposium on (pp. 73-79). Los Alamitos: IEEE Comput. Soc.
Riedmiller, M. (2005). Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method. In Sixteenth European conference on machine learning (ECML), Porto, Portugal (pp. 317-328).
Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Mathematical Modelling, 7(9-12), 1393-1512.
Sutton, R. (1996). Generalization in reinforcement learning: successful examples using sparse coding. In Advances in neural information processing systems (NIPS) (Vol. 8, pp. 1038-1044). Denver: MIT Press.
Sutton, R., & Barto, A. (1998). Reinforcement learning. Cambridge: MIT Press.
Timmer, S., & Riedmiller, M. (2007). Fitted Q iteration with CMACs. In IEEE symposium on approximate dynamic programming and reinforcement learning (ADPRL) (pp. 1-8). Los Alamitos: IEEE Comput. Soc.
Tognetti, S., Savaresi, S., Spelta, C., & Restelli, M. (2009). Batch reinforcement learning for semi-active suspension control. In Control applications (CCA) & intelligent control (ISIC) (pp. 582-587). Los Alamitos: IEEE Comput. Soc.

Name	Provider / Domaine	Expiration	Description
JSESSIONID	Oracle Corporation www.uliege.be	Session	General purpose platform session cookie, used by sites written in JSP. Usually used to maintain an anonymous user session by the server.
CookieScriptConsent	CookieScript .uliege.be	1 year	This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.

Name	Provider / Domaine	Expiration	Description
_pk_id	InnoCraft Ltd .uliege.be	1 year	Used to store a few details about the user such as the unique visitor ID
_pk_ses	InnoCraft Ltd .uliege.be	30 minutes	Short lived cookies used to temporarily store data for the visit
_pk_ref	InnoCraft Ltd .uliege.be	6 months	Used to store the attribution information, the referrer initially used to visit the website