Antos, A., Munos, R., & Szepesvári, C. (2007). Fitted Q-iteration in continuous action space MDPs. In Advances in neural information processing systems (NIPS) (Vol. 20).
Bellman, R. (1957). Dynamic programming. Princeton: Princeton University Press.
Bonarini, A., Caccia, C., Lazaric, A., & Restelli, M. (2008). Batch reinforcement learning for controlling a mobile wheeled pendulum robot. In Artificial intelligence in theory and practice II (pp. 151-160).
Boyan, J., & Moore, A. (1995). Generalization in reinforcement learning: safely approximating the value function. In Advances in neural information processing systems (NIPS) (Vol. 7, pp. 369-376). Denver: MIT Press.
Bradtke, S., & Barto, A. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22, 33-57.
Busoniu, L., Babuska, R., De Schutter, B., & Ernst, D. (2010). Reinforcement learning and dynamic programming using function approximators. London: Taylor & Francis/CRC Press.
Castelletti, A., de Rigo, D., Rizzoli, A., Soncini-Sessa, R., & Weber, E. (2007). Neuro-dynamic programming for designing water reservoir network management policies. Control Engineering Practice, 15(8), 1031-1038.
Castelletti, A., Galelli, S., Restelli, M., & Soncini-Sessa, R. (2010). Tree-based reinforcement learning for optimal water reservoir operation. Water Resources Research, 46, W09507.
Chakraborty, B., Strecher, V., & Murphy, S. (2008). Bias correction and confidence intervals for fitted Q-iteration. In Workshop on model uncertainty and risk in reinforcement learning (NIPS), Whistler, Canada.
Defourny, B., Ernst, D., & Wehenkel, L. (2008). Risk-aware decision making and dynamic programming. In Workshop on model uncertainty and risk in reinforcement learning (NIPS), Whistler, Canada.
Ernst, D., Geurts, P., & Wehenkel, L. (2003). Iteratively extending time horizon reinforcement learning. In European conference on machine learning (ECML) (pp. 96-107).
Ernst, D., Geurts, P., & Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6, 503-556.
Ernst, D., Marée, R., & Wehenkel, L. (2006a). Reinforcement learning with raw image pixels as state input (IWICPAS). In Lecture notes in computer science: Vol. 4153. International workshop on intelligent computing in pattern analysis/synthesis (pp. 446-454).
Ernst, D., Stan, G., Goncalves, J., & Wehenkel, L. (2006b). Clinical data based optimal STI strategies for HIV: a reinforcement learning approach. In Machine learning conference of Belgium and the Netherlands (BeNeLearn) (pp. 65-72).
Ernst, D., Glavic, M., Capitanescu, F., & Wehenkel, L. (2009). Reinforcement learning versus model predictive control: a comparison on a power system problem. IEEE Transactions on Systems, Man and Cybernetics. Part B. Cybernetics, 39, 517-529.
Farahmand, A., Ghavamzadeh, M., Szepesvári, C., & Mannor, S. (2008). Regularized fitted q-iteration: application to planning. In S. Girgin, M. Loth, R. Munos, P. Preux, & D. Ryabko (Eds.), Lecture notes in computer science: Vol. 5323. Recent advances in reinforcement learning (pp. 55-68). Berlin/Heidelberg: Springer.
Fonteneau, R. (2011). Contributions to batch mode reinforcement learning. Ph. D. thesis, University of Liège.
Fonteneau, R., Murphy, S., Wehenkel, L., & Ernst, D. (2009). Inferring bounds on the performance of a control policy from a sample of trajectories. In IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), Nashville, TN, USA.
Fonteneau, R., Murphy, S., Wehenkel, L., & Ernst, D. (2010a). A cautious approach to generalization in reinforcement learning. In Second international conference on agents and artificial intelligence (ICAART), Valencia, Spain.
Fonteneau, R., Murphy, S., Wehenkel, L., & Ernst, D. (2010b). Generating informative trajectories by using bounds on the return of control policies. In Workshop on active learning and experimental design 2010 (in conjunction with AISTATS 2010).
Fonteneau, R., Murphy, S., Wehenkel, L., & Ernst, D. (2010c). Model-free Monte Carlo-like policy evaluation. In JMLR: W&CP: Vol. 9. Thirteenth international conference on artificial intelligence and statistics (AISTATS) (pp. 217-224). Laguna: Chia.
Fonteneau, R., Murphy, S. A., Wehenkel, L., & Ernst, D. (2010d). Towards min max generalization in reinforcement learning. In Communications in computer and information science (CCIS): Vol. 129. Revised selected papers. agents and artificial intelligence: international conference (ICAART 2010), Valencia, Spain (pp. 61-77). Heidelberg: Springer.
Gordon, G. (1995). Stable function approximation in dynamic programming. In Twelfth international conference on machine learning (ICML) (pp. 261-268).
Gordon, G. (1999). Approximate solutions to Markov decision processes. Ph. D. thesis, Carnegie Mellon University.
Guez, A., Vincent, R., Avoli, M., & Pineau, J. (2008). Adaptive treatment of epilepsy via batch-mode reinforcement learning. In Innovative applications of artificial intelligence (IAAI).
Lagoudakis, M., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107-1149.
Lange, S., & Riedmiller, M. (2010). Deep learning of visual control policies. In European symposium on artificial neural networks, computational intelligence and machine learning (ESANN), Brugge, Belgium.
Lazaric, A., Ghavamzadeh, M., & Munos, R. (2010a). Finite-sample analysis of least-squares policy iteration (Tech. Rep.). SEQUEL (INRIA) Lille-Nord Europe.
Lazaric, A., Ghavamzadeh, M., & Munos, R. (2010b). Finite-sample analysis of LSTD. In International conference on machine learning (ICML) (pp. 615-622).
Morimura, T., Sugiyama, M., Kashima, H., Hachiya, H., & Tanaka, T. (2010a). Nonparametric return density estimation for reinforcement learning. In 27th international conference on machine learning (ICML), Haifa, Israel, June 21-25.
Morimura, T., Sugiyama, M., Kashima, H., Hachiya, H., & Tanaka, T. (2010b). Parametric return density estimation for reinforcement learning. In 26th conference on uncertainty in artificial intelligence (UAI), Catalina Island, California, USA, Jul. 8-11 (pp. 368-375).
Munos, R., & Szepesvári, C. (2008). Finite-time bounds for fitted value iteration. Journal of Machine Learning Research, 9, 815-857.
Murphy, S. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statistical Society. Series B, 65(2), 331-366.
Murphy, S., van Der Laan, M., & Robins, J. (2001). Marginal mean models for dynamic regimes. Journal of the American Statistical Association, 96(456), 1410-1423.
Nedi, A., & Bertsekas, D. P. (2003). Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems, 13, 79-110. doi: 10. 1023/A: 1022192903948.
Ormoneit, D., & Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning, 49(2-3), 161-178.
Peters, J., Vijayakumar, S., & Schaal, S. (2003). Reinforcement learning for humanoid robotics. In Third IEEE-RAS international conference on humanoid robots (pp. 1-20). Citeseer.
Pietquin, O., Tango, F., & Aras, R. (2011). Batch reinforcement learning for optimizing longitudinal driving assistance strategies. In Computational intelligence in vehicles and transportation systems (CIVTS), 2011 IEEE Symposium on (pp. 73-79). Los Alamitos: IEEE Comput. Soc.
Riedmiller, M. (2005). Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method. In Sixteenth European conference on machine learning (ECML), Porto, Portugal (pp. 317-328).
Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Mathematical Modelling, 7(9-12), 1393-1512.
Sutton, R. (1996). Generalization in reinforcement learning: successful examples using sparse coding. In Advances in neural information processing systems (NIPS) (Vol. 8, pp. 1038-1044). Denver: MIT Press.
Sutton, R., & Barto, A. (1998). Reinforcement learning. Cambridge: MIT Press.
Timmer, S., & Riedmiller, M. (2007). Fitted Q iteration with CMACs. In IEEE symposium on approximate dynamic programming and reinforcement learning (ADPRL) (pp. 1-8). Los Alamitos: IEEE Comput. Soc.
Tognetti, S., Savaresi, S., Spelta, C., & Restelli, M. (2009). Batch reinforcement learning for semi-active suspension control. In Control applications (CCA) & intelligent control (ISIC) (pp. 582-587). Los Alamitos: IEEE Comput. Soc.