RelaxMCD: smooth optimisation for the Minimum Covariance Determinant estimator

Schyns, Michael; Haesbroeck, Gentiane; Critchley, Frank

doi:10.1016/j.csda.2009.11.005

Request a copy

Article (Scientific journals)

RelaxMCD: smooth optimisation for the Minimum Covariance Determinant estimator

Schyns, Michael; Haesbroeck, Gentiane; Critchley, Frank

2010 • In Computational Statistics and Data Analysis, 54 (4), p. 843-857

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/2268/12074

DOI
10.1016/j.csda.2009.11.005

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

relaxMCD_final.pdf

Author postprint (397.87 kB)

Request a copy

The original publication is available at www.sciencedirect.com (csda)

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

MCD estimator; resampling algorithms; k-means; robustness

Abstract :

[en] The Minimum Covariance Determinant (MCD) estimator is a highly robust procedure for estimating the center and shape of a high dimensional data set. It consists of determining a subsample of h points out of n which minimizes the generalized variance. By definition, the computation of this estimator gives rise to a combinatorial optimization problem, for which several approximative algorithms have been developed. Some of these approximations are quite powerful, but they do not take advantage of any smoothness in the objective function. In this paper, focus is on the approach outlined in a general framework in Critchley et al. (2009) and which transforms any discrete and high dimensional combinatorial problem of this type into a continuous and low-dimensional one. The idea is to build on the general algorithm proposed by Critchley et al. (2009) in order to take into account the particular features of the MCD methodology. More specifically, both the adaptation of the algorithm to the specific MCD target function as well as the comparison of this “specialized” algorithm with the usual competitors for computing MCD are the main goals of this paper. The adaptation focuses on the design of “clever” starting points in order to systematically investigate the search domain. Accordingly, a new and surprisingly efficient procedure based on the well known k-means algorithm is constructed. The adapted algorithm, called RelaxMCD, is then compared by means of simulations and examples with FASTMCD and the Feasible Subset Algorithm, both benchmark algorithms for computing MCD. As a by-product, it is shown that RelaxMCD is a general technique encompassing the two others, yielding insight about their overall good performance.

Research Center/Unit :

QuantOM

Disciplines :

Mathematics

Author, co-author :

Schyns, Michael ; Université de Liège - ULiège > HEC - École de gestion de l'ULiège > Informatique de gestion

Haesbroeck, Gentiane ; Université de Liège - ULiège > Département de mathématique > Statistique (aspects théoriques)

Critchley, Frank; The Open University > Department of Mathematics and Statistics

Language :

English

Title :

RelaxMCD: smooth optimisation for the Minimum Covariance Determinant estimator

Publication date :

April 2010

Journal title :

Computational Statistics and Data Analysis

ISSN :

0167-9473

eISSN :

1872-7352

Publisher :

Elsevier Science, Amsterdam, Netherlands

Volume :

Issue :

Pages :

843-857

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

http://dx.doi.org/10.1016/j.csda.2009.11.005

Available on ORBi :

since 04 May 2009

Statistics

Number of views

268 (44 by ULiège)

Number of downloads

11 (8 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

Agulló, J., 1998. Computing the minimum covariance determinant estimator. Universidad de Alicante
Bernholt T., and Fisher P. The complexity of computing the MCD-estimator. Theoretical Computer Science 326 (2004) 383-398
Butler R.W., Davies P.L., and Jhun M. Asymptotics for the minimum covariance determinant estimator. The Annals of Statistics 21 (1993) 1385-1400
Critchley, F., Schyns, M., Haesbroeck, G., Fauconnier, C., Lu, G., Atkinson, R.A., Wang, D.Q., 2009. A relaxed approach to combinatorial problems in robustness and diagnostics. Statistics and Computing (forthcoming)
García-Escudero L.M., and Gordaliza A. The importance of the scales in heterogeneous robust clustering. Computational Statistics and Data Analysis 51 (2007) 4403-4412
Hawkins D.M. The feasible solution algorithm for the minimum covariance determinant estimator in multivariate data. Computational Statistics and Data Analysis 17 (1994) 197-210
Hawkins D.M., and Olive D.J. Improved feasible solution algorithms for high breakdown estimators. Computational Statistics and Data Analysis 30 (1999) 1-11
Hawkins D.M., and Olive D.J. Inconsistency of resampling algorithms for high-breakdown regression estimators and a new algorithm. Journal of the American Statistical Association 97 (2002) 136-148
Horst R., and Tuy H. Global optimization. Deterministic Approaches. 3rd ed. (1995), Springer
Johnson R.A., and Wichern D.W. Applied Multivariate Statistical Analysis. 3rd ed. (1992), Prentice-Hall
Pardalos P.M., and Rosen J.B. Constrained Global Optimization: Algorithms and Applications. Lecture Notes in Computer Science (1987), Springer-Verlag, New York
Peña D., and Prieto F.J. Multivariate outlier detection and robust covariance matrix estimation. Journal of the American Statistical Association 43 (2001) 286-303
Rousseeuw P.J. Multivariate estimation with high breakdown point. In: Grossmann W., Pflug G., Vincze I., and Wertz W. (Eds). Mathematical Statistics and Applications vol. B (1985), Dordrecht, Reidel 283-297
Rousseeuw P.J., and Leroy A.M. Robust Regression and Outlier Detection (1987), John Wiley, New York
Rousseeuw P.J., and Van Driessen K. A fast algorithm for the minimum covariance determinant estimator. Technometrics 41 (1999) 212-223
Todorov V. Computing the minimum covariance determinant estimator (MCD) by simulated annealing. Computational Statistics and Data Analysis 14 (1992) 515-525
Woodruff D.L., and Rocke D.M. Computable robust estimation of multivariate location and shape in high dimension using compound estimators. Journal of the American Statistical Association 89 (1994) 888-896