Antos A., Munos R., Szepesvari C., « Fitted Q-iteration in continuous action-space MDPs », in J. Platt, D. Koller, Y. Singer, S. Roweis (eds), Advances in Neural Information Processing Systems 20, MIT Press, Cambridge, MA, p. 9-16, 2008.
Bradtke S., Barto A., « Linear least-squares algorithms for temporal difference learning », Machine Learning, vol. 22, p. 33-57, 1996. (Pubitemid 126724362)
Busoniu L., Babuska R., De Schutter B., Ernst D., Reinforcement Learning and Dynamic Programming using Function Approximators, Taylor & Francis CRC Press, 2010.
Dayan P., « The Convergence of TD(γ) for general γ », Machine Learning, vol. 8, p. 341-162, 1992.
Dimitrakakis C., Lagoudakis M. G., « Rollout sampling approximate policy iteration », Machine Learning, vol. 72, p. 157-171, 2008.
Ernst D., Geurts P., Wehenkel L., « Tree-based batch mode reinforcement learning », Journal of Machine Learning Research, vol. 6, p. 503-556, 2005.
Fonteneau R., Murphy S., Wehenkel L., Ernst D., « Model-free Monte Carlo-like policy evaluation », Proceedings of The Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS) 2010, JMLR: W&CP 9, Chia Laguna, Sardinia, Italy, p. 217-224, 2010.
Munos R., Szepesvári C., « Finite-time bounds for fitted value iteration », Journal of Machine Learning Research, vol. 9, p. 815-857, 2008.
Ormoneit D., Sen S., « Kernel-based reinforcement learning », Machine Learning, vol. 49, no 2-3, p. 161-178, 2002. (Pubitemid 34325684)
Riedmiller M., « Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method », Proceedings of the Sixteenth European Conference on Machine Learning (ECML 2005), Porto, Portugal, p. 317-328, 2005.
Rummery G., Niranjan M., On-line Q-learning using connectionist systems, Technical Report no 166, Cambridge University Engineering Department, 1994.
Sutton R., « Learning to predict by the methods of temporal difference », Machine Learning, vol. 3, p. 9-44, 1988.
Sutton R. S., Maei H. R., Precup D., Bhatnagar S., Silver D., Szepesvári C.,Wiewiora E., « Fast gradient-descent methods for temporal-difference learning with linear function approximation », Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, ACM, New York, NY, USA, p. 993-1000, 2009.
Tsitsiklis J., « Asynchronous stochastic approximation and Q-learning », Machine Learning, vol. 16, p. 185-202, 1994.
Watkins C., Dayan P., « Q-Learning », Machine Learning, vol. 8, no 3-4, p. 179-192, 1992.