[en] Approximate Policy Iteration (API) is a reinforcement learning paradigm that is able to solve high- dimensional, continuous control problems. We propose to exploit API for the closed-loop learning of mappings from images to actions. This approach requires a family of function approximators that maps visual percepts to a real-valued function. For this purpose, we use Regression Extra-Trees, a fast, yet accurate and versatile machine learning algorithm. The inputs of the Extra-Trees consist of a set of visual features that digest the informative patterns in the visual signal. We also show how to parallelize the Extra-Tree learning process to further reduce the computational expense, which is often essential in visual tasks. Experimental results on real-world images are given that indicate that the combination of API with Extra-Trees is a promising framework for the interactive learning of visual tasks.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
Jodogne, Sébastien ; Centre Hospitalier Universitaire de Liège - CHU > Radiothérapie
Briquet, Cyril ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Informatique (ingénierie du logiciel et algorithmique)
Piater, Justus ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > INTELSIG Group
Langue du document :
Anglais
Titre :
Approximate Policy Iteration for Closed-Loop Learning of Visual Tasks
Bertsekas, D., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific (1996)
Jodogne, S., Piater, J.: Interactive learning of mappings from visual percepts to actions. In De Raedt, L., Wrobel, S., eds.: Proc. of the 22nd International Conference on Machine Learning (ICML), Bonn (Germany), ACM (2005) 393-400
Lagoudakis, M., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4 (2003) 1107-1149
Marée, R., Geurts, P., Piater, J., Wehenkel, L.: Random subwindows for robust image classification. In: IEEE Conference on Computer Vision and Pattern Recognition. Volume 1., San Diego (CA, USA) (2005) 34-40
Puterman, M., Shin, M.: Modified policy iteration algorithms for discounted Markov decision problems. Management Science 24 (1978) 1127-1137
Schmid, C., Mohr, R., Bauckhage, C.: Evaluation of interest point detectors. International Journal of Computer Vision 37(2) (2000) 151-172
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition. Volume 2., Madison (WI, USA) (2003) 257-263
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6 (2005) 503-556
Cohen, B.: Incentives build robustness in BitTorrent. In: Proc. of the Workshop on Economics of Peer-to-Peer Systems. (2003)
Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2) (2004) 91-110
Jodogne, S., Piater, J.: Learning, then compacting visual policies (extended abstract). In: Proc. of the 7th European Workshop on Reinforcement Learning (EWRL), Napoli (Italy)(2005) 8-10
Jodogne, S., Scalzo, F., Piater, J.: Task-driven learning of spatial combinations of visual features. In: Proc. of the IEEE Workshop on Learning in Computer Vision and Pattern Recognition, San Diego (CA, USA), IEEE (2005)