Closed-Loop Learning of Visual Control Policies

Jodogne, Sébastien; Piater, Justus

doi:10.1613/jair.2110

Request a copy

Article (Scientific journals)

Closed-Loop Learning of Visual Control Policies

Jodogne, Sébastien; Piater, Justus

2007 • In Journal of Artificial Intelligence Research, 28, p. 349-391

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/2268/59562

DOI
10.1613/jair.2110

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

Jodogne-2007-JAIR.pdf

Publisher postprint (1.73 MB)

Request a copy

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Disciplines :

Computer science

Author, co-author :

Jodogne, Sébastien ; Centre Hospitalier Universitaire de Liège - CHU > Radiothérapie

Piater, Justus ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > INTELSIG Group

Language :

English

Title :

Closed-Loop Learning of Visual Control Policies

Publication date :

2007

Journal title :

Journal of Artificial Intelligence Research

ISSN :

1076-9757

eISSN :

1943-5037

Publisher :

Morgan Kaufmann Publishers, United States - California

Volume :

Pages :

349-391

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

http://www.jair.org/papers/paper2110.html

Available on ORBi :

since 31 May 2010

Statistics

Number of views

84 (13 by ULiège)

Number of downloads

2 (2 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

Aloimonos, Y. (1990). Purposive and qualitative active vision. In Proc. of the 10th International Conference on Pattern Recognition, pp. 436-460.
Amit, Y., & Kong, A. (1996). Graphical templates for model registration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(3), 225-236.
Asada, M., Noda, S., Tawaratsumida, S., & Hosoda, K. (1994). Vision-based behavior acquisition for a shooting robot by using a reinforcement learning. In Proc. of IAPR/IEEE Workshop on Visual Behaviors, pp. 112-118.
Bagnell, J., & Schneider, J. (2001). Autonomous helicopter control using reinforcement learning policy search methods. In Proc. of the International Conference on Robotics and Automation. IEEE.
Barto, A., Sutton, R., & Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man and Cybernetics, 13(5), 835-846.
Bellman, R. (1957). Dynamic Programming. Princeton University Press.
Bertsekas, D., & Tsitsiklis, J. (1996). Neuro-Dynamic Programming. Athena Scientific.
Boigelot, B. (1999). Symbolic Methods for Exploring Infinite State Spaces. Ph.D. thesis, University of Liège, Liège (Belgium).
Boigelot, B., Jodogne, S., & Wolper, P. (2005). An effective decision procedure for linear arithmetic with integer and real variables. ACM Transactions on Computational Logic (TOCL), 6(3), 614-633.
Bouchard, G., & Triggs, B. (2005). Hierarchical part-based visual object categorization. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. 710-715, San Diego (CA, USA).
Breiman, L., Friedman, J., & Stone, C. (1984). Classification and Regression Trees. Wadsworth International Group.
Bryant, R. (1986). Graph-based algorithms for boolean function manipulation. IEEE Transactions in Computers, 8 (35), 677-691.
Bryant, R. (1992). Symbolic boolean manipulation with ordered binary decision diagrams. ACM Computing Surveys, 24(3), 293-318.
Burl, M., & Perona, P. (1996). Recognition of planar object classes. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 223-230, San Francisco (CA, USA).
Chapman, D., & Kaelbling, L. (1991). Input generalization in delayed reinforcement learning: An algorithm and performance comparisons. In Proc. of the 12th International Joint Conference on Artificial Intelligence (IJCAI), pp. 726-731, Sydney.
Chrisman, L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In National Conference on Artificial Intelligence, pp. 183-188.
Coelho, J., Plater, J., & Grupen, R. (2001). Developing haptic and visual perceptual categories for reaching and grasping with a humanoid robot. Robotics and Autonomous Systems, special issue on Humanoid Robots, 57(2-3), 195-218.
Crandall, D., & Huttenlocher, D. (2006). Weakly supervised learning of part-based spatial models for visual object recognition. In Proc. of the 9th European Conference on Computer Vision.
Delzanno, G., & Raskin, J.-F. (2000). Symbolic representation of upward closed sets. In Took and Algorithms for the Construction and Analysis of Systems, Lecture Notes in Computer Science, pp. 426-440, Berlin (Germany).
Derman, C. (1970). Finite State Markovian Decision Processes. Academic Press, New York.
Ernst, D., Geurts, P., & Wehenkel, L. (2003). Iteratively extending time horizon reinforcement learning. In Proc. of the 14th European Conference on Machine Learning, pp. 96-107, Dubrovnik (Croatia).
Ernst, D., Geurts, P., & Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6, 503-556.
Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55-79.
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 264-271, Madison (WI, USA).
Fischler, M., & Elschlager, R. (1973). The representation and matching of pictorial structures. IEEE Transactions on Computers, 22(1), 67-92.
Forsyth, D., Haddon, J., & Ioffe, S. (1999). Finding objects by grouping primitives. In Shape, Contour and Grouping in Computer Vision, pp. 302-318, London (UK). SpringerVerlag.
Gaskett, C., Fletcher, L., & Zelinsky, A. (2000). Reinforcement learning for visual servoing of a mobile robot. In Proc. of the Australian Conference on Robotics and Automation, Melbourne (Australia).
Gibson, E., & Spelke, E. (1983). The development of perception. In Flavell, J. H., & Markman, E. M. (Eds.), Handbook of Child Psychology Vol. III: Cognitive Development (4th edition)., chap. 1, pp. 2-76. Wiley.
Givan, R., Dean, T., & Greig, M. (2003). Equivalence notions and model minimization in markov decision processes. Artificial Intelligence, 147(1-2), 163-223.
Gordon, G. (1995). Stable function approximation in dynamic programming. In Proc. of the International Conference on Machine Learning, pp. 261-268.
Gouet, V., & Boujemaa, N. (2001). Object-based queries using color points of interest. In IEEE Workshop on Content-Based Access of Image and Video Libraries, pp. 30-36, Kauai (HI, USA).
Howard, R. (1960). Dynamic Programming and Markov Processes. Technology Press and Wiley, Cambridge (MA) and New York.
Huber, M., & Grupen, R. (1998). A control structure for learning locomotion gaits. In 7th Int. Symposium on Robotics and Applications, Anchorage (AK, USA). TSI Press.
lida, M., Sugisaka, M., & Shibata, K. (2002). Direct-vision-based reinforcement learning to a real mobile robot. In Proc. of International Conference of Neural Information Processing Systems, Vol. 5, pp. 2556-2560.
Jaakkola, T., Jordan, M., & Singh, S. (1994). Convergence of stochastic iterative dynamic programming algorithms. In Cowan, J. D., Tesauro, G., & Alspector, J. (Eds.), Advances in Neural Information Processing Systems, Vol. 6, pp. 703-710. Morgan Kaufmann Publishers.
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264-323.
Jodogne, S., & Piater, J. (2005a). Interactive learning of mappings from visual percepts to actions. In De Raedt, L., & Wrobel, S. (Eds.), Proc. of the 22nd International Conference on Machine Learning (ICML), pp. 393-400, Bonn (Germany). ACM.
Jodogne, S., & Plater, J. (2005b). Learning, then compacting visual policies (extended abstract). In Proc. of the 7th European Workshop on Reinforcement Learning (EWRL), pp. 8-10, Napoli (Italy).
Jodogne, S., Scalzo, F., & Piater, J. (2005). Task-driven learning of spatial combinations of visual features. In Proc. of the IEEE Workshop on Learning in Computer Vision and Pattern Recognition, San Diego (CA, USA). IEEE.
Kaelbling, L., Littman, M., &: Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1-2), 99-134.
Kaelbling, L., Littman, M., & Moore, A. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237-285.
Kimura, H., Yamashita, T., & Kobayashi, S. (2001). Reinforcement learning of walking behavior for a four-legged robot. In Proc. of the 40th IEEE Conference on Decision and Control, Orlando (FL, USA).
Kohl, N., & Stone, P. (2004). Policy gradient reinforcement learning for fast quadrupedal locomotion. In Proc. of the IEEE International Conference on Robotics and Automation, pp. 2619-2624, New Orleans.
Kumar, M., Torr, P., & Zisserman, A. (2004). Extending pictorial structures for object recognition. In Proc. of the British Machine Vision Conference.
Kwok, C., & Fox, D. (2004). Reinforcement learning for sensing strategies. In Proc. of the IEEE International Conference on Intelligent Robots and Systems.
Lagoudakis, M., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107-1149.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91-110.
Martinez-Marín, T., &: Duckett, T. (2005). Fast reinforcement learning for vision-guided mobile robots. In Proc. of the IEEE International Conference on Robotics and Automation, pp. 18-22, Barcelona (Spain).
McCallum, R. (1996). Reinforcement Learning with Selective Perception and Hidden State. Ph.D. thesis, University of Rochester, New York.
Michels, J., Saxena, A., & Ng, A. (2005). High speed obstacle avoidance using monocular vision and reinforcement learning. In Proc. of the 22nd International Conference in Machine Learning, pp. 593-600, Bonn (Germany).
Mikolajczyk, K., & Schmid, C. (2003). A performance evaluation of local descriptors. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 257-263, Madison (WI, USA).
Moore, A., & Atkeson, C. (1995). The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. Machine Learning, 21.
Munos, R., & Moore, A. (2002). Variable resolution discretization in optimal control. Machine Learning, 49, 291-323.
Nene, S., Nayar, S., & Murase, H. (1996). Columbia object image library (COIL-100). Tech. rep. CUCS-006-96, Columbia University, New York.
Ng, A., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, B., & Liang, E. (2004). Inverted autonomous helicopter flight via reinforcement learning. In Proc. of the International Symposium on Experimental Robotics.
Paletta, L., Fritz, G., &: Seifert, C. (2005). Q-learning of sequential attention for visual object recognition from informative local descriptors. In Proc. of the 22nd International Conference on Machine Learning (ICML), pp. 649-656, Bonn (Germany).
Paletta, L., & Pinz, A. (2000). Active object recognition by view integration and reinforcement learning. Robotics and Autonomous Systems, 31(1-2), 71-86.
Peng, J., & Bhanu, B. (1998). Closed-loop object recognition using reinforcement learning. IEEE rlransactions on Pattern Analysis and Machine Intelligence, 20(2), 139-154.
Perona, P., Fergus, R., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Conference on Computer Vision and Pattern Recognition, Vol. 2, p. 264.
Piater, J. (2001). Visual Feature Learning. Ph.D. thesis, University of Massachusetts, Computer Science Department, Amherst (MA, USA).
Puterman, M., & Shin, M. (1978). Modified policy iteration algorithms for discounted Markov decision problems. Management Science, 24, 1127-1137.
Pyeatt, L., & Howe, A. (2001). Decision tree function approximation in reinforcement learning. In Proc. of the Third International Symposium on Adaptive Systems, pp. 70-77, Havana, Cuba.
Quinlan, J. (1993). C4-5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (CA, USA).
Randl0v, J., & Alstrøm, P. (1998). Learning to drive a bicycle using reinforcement learning and shaping. In Proc. of the 15th International Conference on Machine Learning, pp. 463-471, Madison (WI, USA). Morgan Kaufmann.
Rummery, G., & Niranjan, M. (1994). On-line Q-learning using connectionist sytems. Tech. rep. CUED/F-INFENG-TR 166, Cambridge University.
Salganicoff, M. (1993). Density-adaptive learning and forgetting. In Proc. of the 10th International Conference on Machine Learning, pp. 276-283, Amherst (MA, USA). Morgan Kaufmann Publishers.
Scalzo, F., & Piater, J. (2006). Unsupervised learning of dense hierarchical appearance representations. In Proc. of the 18th International Conference on Pattern Recognition, Hong-Kong.
Schaal, S. (1997). Learning from demonstration. In Mozer, M. C., Jordan, M., & Petsche, T. (Eds.), Advances in Neural Information Processing Systems, Vol. 9, pp. 1040-1046. Cambridge, MA, MIT Press.
Schmid, C., & Mohr, R. (1997). Local greyvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 530-535.
Schmid, C., Mohr, R., & Bauckhage, C. (2000). Evaluation of interest point detectors. International Journal of Computer Vision, 37(2), 151-172.
Schyns, P., & Rodet, L. (1997). Categorization creates functional features. Journal of Experimental Psychology: Learning, Memory and Cognition, 23(3), 681-696.
Shibata, K., &z lida, M. (2003). Acquisition of box pushing by direct-vision-based reinforcement learning. In Proc. of the Society of Instrument and Control Engineers Annual Conference, p. 6.
Singh, S., Jaakkola, T., & Jordan, M. (1995). Reinforcement learning with soft state aggregation. In Advances in Neural Information Processing Systems, Vol. 7, pp. 361-368. MIT Press.
Sudderth, E., Ihler, A., Freeman, W., & Willsky, A. (2003). Nonparametric belief propagation. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 605-612.
Sutton, R. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9-44.
Sutton, R., & Barto, A. (1998). Reinforcement Learning, an Introduction. MIT Press.
Takahashi, Y., Takeda, M., & Asada, M. (1999). Continuous valued Q-learning for visionguided behavior acquisition. In Proc. of the International Conference on Multisensor Fusion and Integration for Intelligent Systems, pp. 255-260.
Tarr, M., &: Cheng, Y. (2003). Learning to see faces and objects. Trends in Cognitive Sciences, 7(1), 23-30.
Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), 58-68.
Uther, W., & Veloso, M. (1998). Tree based discretization for continuous state space reinforcement learning. In Proc. of the 15th National Conference on Artificial Intelligence (AAAI), pp. 769-774, Madison (WI, USA).
Watkins, C. (1989). Learning From Delayed Rewards. Ph.D. thesis, King's College, Cambridge (UK).
Weber, C., Wermter, S., & Zochios, A. (2004). Robot docking with neural vision and reinforcement. Knowledge-Based Systems, 17(2-4), 165-172.
Wettergreen, D., Gaskett, C., & Zelinsky, A. (1999). Autonomous guidance and control for an underwater robotic vehicle. In Proc. of the International Conference on Field and Service Robotics, Pittsburgh (USA).
Whitehead, S., & Ballard, D. (1991). Learning to perceive and act by trial and error. Machine Learning, 7, 45-83.
Yin, P.-Y. (2002). Maximum entropy-based optimal threshold selection using deterministic reinforcement learning with controlled randomization. Signal Processing, 82, 993-1006.
Yoshimoto, J., Ishii, S., & Sato, M. (1999). Application of reinforcement learning to balancing ACROBOT. In Proc. of the 1999 IEEE International Conference on Systems, Man and Cybernetics, pp. 516-521.