[en] The paper addresses the problem of learning a regression model parameterized by a fixed-rank positive semidefinite matrix. The focus is on the nonlinear nature of the search space and on scalability to high-dimensional problems. The mathematical developments rely on the theory of gradient descent algorithms adapted to the Riemannian geometry that underlies the set of fixed-rank positive semidefinite matrices. In contrast with previous contributions in the literature, no restrictions are imposed on the range space of the learned matrix. The resulting algorithms maintain a linear complexity in the problem size and enjoy important invariance properties. We apply the proposed algorithms to the problem of learning a distance function parameterized by a positive semidefinite matrix. Good performance is observed on classical benchmarks.
Disciplines :
Computer science
Author, co-author :
Meyer, Gilles ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Bonnabel, Silvère; Mines ParisTech > Robotics center
Sepulchre, Rodolphe ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Language :
English
Title :
Regression on fixed-rank positive semidefinite matrices: a Riemannian approach
Publication date :
03 March 2011
Journal title :
Journal of Machine Learning Research
ISSN :
1532-4435
eISSN :
1533-7928
Publisher :
Microtome Publishing, Brookline, United States - Massachusetts
P.-A. Absil, R. Mahony, and R. Sepulchre. Riemannian geometry of Grassmann manifolds with a view on algorithmic computation. Acta Appl. Math., 80(2):199-220, 2004.
P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008.
R. Arora. On learning rotations. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems, volume 22, pages 55-63. MIT Press, 2009.
V. Arsigny, P. Fillard, X. Pennec, and N. Ayache. Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM Journal on Matrix Analysis and Applications, 29(1):328-347, 2007. (Pubitemid 351269899)
A. Asuncion and D. J. Newman. UCI machine learning repository, 2007. URL http://www.ics. uci.edu/~mlearn/MLRepository.html. University of California, Irvine, School of Information and Computer Sciences.
F. Bach and M. I. Jordan. Predictive low-rank decomposition for kernel methods. In Proceedings of the 22nd International Conference on Machine Learning (ICML), 2005.
S. Bonnabel and R. Sepulchre. Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank. SIAM Journal on Matrix Analysis and Applications, 31(3):1055-1070, 2009.
I. Borg and P. Groenen. Modern Multidimensional Scaling: Theory and Applications. Springer, second edition, 2005.
L. Bottou. Online algorithms and stochastic approximations. In David Saad, editor, Online Learning and Neural Networks. Cambridge University Press, 1998.
L. Bottou. Stochastic learning. In Advanced Lectures on Machine Learning, number 3176 in Lecture Notes in Artificial Intelligence, LNAI-3176, pages 146-168. Springer Verlag, 2004. (Pubitemid 39741631)
L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, volume 20, pages 161-168. MIT Press, 2007.
D. Cai, X. He, and J. Han. Efficient kernel discriminant analysis via spectral regression. In Proceedings of the International Conference on Data Mining (ICDM07), 2007.
R. Chatpatanasiri, T. Korsrilabutr, P. Tangchanachaianan, and B. Kijsirikul. A new kernelization framework for Mahalanobis distance learning algorithms. Neurocomputing, 73 (10-12): 1570-1579, 2010.
T. Chen, Y. Hua, and W.-Y. Yan. Global convergence of Oja's subspace algorithm for principal component extraction. IEEE Transaction Neural Networks, 9(1):58-67, 1998. (Pubitemid 128743614)
T. F. Cox and M. A. A. Cox. Multidimensional Scaling. Chapman and Hall, 2001.
K. Crammer. Online tracking of linear subspaces. In Proceedings of 19th Annual Conference on Learning Theory (COLT), 2006.
J. V. Davis and I. S. Dhillon. Structured metric learning for high dimensional problems. In Proceedings of the 14th ACM SIGKDD conference on Knowledge Discovery and Data Mining, 2008.
J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon. Information-theoretic metric learning. In Proceedings of the 24th International Conference on Machine Learning (ICML), 2007.
A. Edelman, T. A. Arias, and S. T. Smith. The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20(2):303-353, 1998. (Pubitemid 129333771)
J. Faraut and A. Koranyi. Analysis on Symmetric Cones. Oxford University Press, 1994.
S. Fine, K. Scheinberg, N. Cristianini, J. Shawe-Taylor, and B. Williamson. Efficient SVM training using low-rank kernel representations. Journal of Machine Learning Research, 2:243-264, 2001.
A. Globerson and S. Roweis. Metric learning by collapsing classes. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems, volume 18, pages 451-458. MIT Press, 2005.
J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood components analysis. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems, volume 17, pages 513-520. MIT Press, 2004.
G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 1996.
P. Jain, B. Kulis, I. S. Dhillon, and K. Grauman. Online metric learning and fast similarity search. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems, volume 21, pages 761-768. MIT Press, 2008.
P. Jain, B. Kulis, and I. S. Dhillon. Inductive regularized learning of kernel functions. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems, volume 23, pages 946-954. MIT Press, 2010.
M. Journée, F. Bach, P.-A. Absil, and R. Sepulchre. Low-rank optimization for semidefinite convex problems. SIAM Journal on Optimization, 20(5):2327-2351, 2010.
J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Journal of Information and Computation, 132(1):1-64, January 1997.
B. Kulis, M. Sustik, and I. S. Dhillon. Low-rank kernel learning with Bregman matrix divergences. Journal of Machine Learning Research, 10:341-376, 2009.
J. Kwok and I. Tsang. Learning with idealized kernels. Proceedings of the 20th International Conference on Machine learning (ICML), 2003.
G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M. I. Jordan. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5:27-72, 2004.
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Handwritten digit recognition with a back-propagation network. In David Touretzky, editor, Advances in Neural Information Processing Systems, volume 2. Morgan Kaufman, 1989.
Y. Matsuda and K. Yamaguchi. Global mapping analysis: stochastic gradient algorithm in SSTRESS and classical MDS stress. In Proceedings of International Conference on Neural Information Processing, 2001.
J. Nocedal and S. J. Wright. Numerical Optimization, Second Edition. Springer, 2006.
E. Oja. Principal components, minor components, and linear neural networks. Neural Networks, 5:927-935, 1992. (Pubitemid 23581250)
E. F. Petricoin, D. K. Ornstein, C. P. Paweletz, A. M. Ardekani, P. S. Hackett, B. A. Hitt, A. Velassco, C. Trucco, L. Wiegand, K. Wood, C. B. Simone, P. J. Levine, W. M. Linehan, M. R. Emmert-Buck, S. M. Steinberg, E. C. Kohn, and L. A. Liotta. Serum proteomic patterns for detection of prostate cancer. Journal of the National Cancer Institute, 94(20):1576-1578, 2002. (Pubitemid 35277282)
B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. SIAM Review, 52(3):471-501, 2010.
R. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37:297-336, 1999. (Pubitemid 32210620)
B. Schölkopf, A. J. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(3):1299-1319, 1998. (Pubitemid 128463674)
S. Shalev-Shwartz, Y. Singer, and A. Ng. Online and batch learning of pseudo-metrics. In Proceedings of the 21st International Conference on Machine Learning (ICML), 2004.
J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.
S. T. Smith. Covariance, subspace, and intrinsic Cramér-Rao bounds. IEEE Transactions on Signal Processing, 53(5):1610-1630, 2005. (Pubitemid 40679137)
L. Song, A. Smola, K. Borgwardt, and A. Gretton. Colored maximum variance unfolding. In J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, volume 20, pages 1385-1392. MIT Press, 2007.
A. Strehl, J. Ghosh, and R. Mooney. Impact of similarity measures on web-page clustering. In Workshop on Artificial Intelligence for Web Search (AAAI), 2000.
L. Torresani and K. Lee. Large margin component analysis. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems, volume 19, pages 1385-1392. MIT Press, 2006.
K. Tsuda, G. Ratsch, and M. Warmuth. Matrix exponentiated gradient updates for on-line learning and Bregman projection. Journal of Machine Learning Research, 6:995-1018, 2005.
M. Warmuth. Winnowing subspaces. Proceedings of the 24th international conference on Machine learning (ICML), 2007.
K. Weinberger and L. Saul. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10 (Feb): 207-244, 2009.
K. Weinberger, F. Sha, and L. Saul. Learning a kernel matrix for nonlinear dimensionality reduction. Proceedings of the 21st International Conference on Machine Learning (ICML), pages 839-846, 2004. (Pubitemid 40290888)
E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning with application to clustering with side-information. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems, volume 15, pages 505-512. MIT Press, 2002.
L. Yang. Distance metric learning: a comprehensive survey. Technical report, Michigan State University, 2006.
J. Zhuang, I. Tsang, and S. Hoi. SimpleNPKL: simple non-parametric kernel learning. Proceedings of the 26th International Conference on Machine Learning (ICML), 2009.
Similar publications
Sorry the service is unavailable at the moment. Please try again later.
This website uses cookies to improve user experience. Read more
Save & Close
Accept all
Decline all
Show detailsHide details
Cookie declaration
About cookies
Strictly necessary
Performance
Strictly necessary cookies allow core website functionality such as user login and account management. The website cannot be used properly without strictly necessary cookies.
This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.
Performance cookies are used to see how visitors use the website, eg. analytics cookies. Those cookies cannot be used to directly identify a certain visitor.
Used to store the attribution information, the referrer initially used to visit the website
Cookies are small text files that are placed on your computer by websites that you visit. Websites use cookies to help users navigate efficiently and perform certain functions. Cookies that are required for the website to operate properly are allowed to be set without your permission. All other cookies need to be approved before they can be set in the browser.
You can change your consent to cookie usage at any time on our Privacy Policy page.