[en] The Minimum Covariance Determinant (MCD) estimator is a highly robust procedure for estimating the
center and shape of a high dimensional data set. It consists of determining a subsample of h points out
of n which minimizes the generalized variance. By definition, the computation of this estimator gives rise
to a combinatorial optimization problem, for which several approximative algorithms have been developed.
Some of these approximations are quite powerful, but they do not take advantage of any smoothness in the
objective function. In this paper, focus is on the approach outlined in a general framework in Critchley et
al. (2009) and which transforms any discrete and high dimensional combinatorial problem of this type into a
continuous and low-dimensional one. The idea is to build on the general algorithm proposed by Critchley et
al. (2009) in order to take into account the particular features of the MCD methodology. More specifically,
both the adaptation of the algorithm to the specific MCD target function as well as the comparison of this
“specialized” algorithm with the usual competitors for computing MCD are the main goals of this paper. The
adaptation focuses on the design of “clever” starting points in order to systematically investigate the search
domain. Accordingly, a new and surprisingly efficient procedure based on the well known k-means algorithm
is constructed. The adapted algorithm, called RelaxMCD, is then compared by means of simulations and
examples with FASTMCD and the Feasible Subset Algorithm, both benchmark algorithms for computing
MCD. As a by-product, it is shown that RelaxMCD is a general technique encompassing the two others,
yielding insight about their overall good performance.
Research center :
QuantOM
Disciplines :
Mathematics
Author, co-author :
Schyns, Michael ; Université de Liège - ULiège > HEC - École de gestion de l'ULiège > Informatique de gestion
Haesbroeck, Gentiane ; Université de Liège - ULiège > Département de mathématique > Statistique (aspects théoriques)
Critchley, Frank; The Open University > Department of Mathematics and Statistics
Language :
English
Title :
RelaxMCD: smooth optimisation for the Minimum Covariance Determinant estimator
Agulló, J., 1998. Computing the minimum covariance determinant estimator. Universidad de Alicante
Bernholt T., and Fisher P. The complexity of computing the MCD-estimator. Theoretical Computer Science 326 (2004) 383-398
Butler R.W., Davies P.L., and Jhun M. Asymptotics for the minimum covariance determinant estimator. The Annals of Statistics 21 (1993) 1385-1400
Critchley, F., Schyns, M., Haesbroeck, G., Fauconnier, C., Lu, G., Atkinson, R.A., Wang, D.Q., 2009. A relaxed approach to combinatorial problems in robustness and diagnostics. Statistics and Computing (forthcoming)
García-Escudero L.M., and Gordaliza A. The importance of the scales in heterogeneous robust clustering. Computational Statistics and Data Analysis 51 (2007) 4403-4412
Hawkins D.M. The feasible solution algorithm for the minimum covariance determinant estimator in multivariate data. Computational Statistics and Data Analysis 17 (1994) 197-210
Hawkins D.M., and Olive D.J. Improved feasible solution algorithms for high breakdown estimators. Computational Statistics and Data Analysis 30 (1999) 1-11
Hawkins D.M., and Olive D.J. Inconsistency of resampling algorithms for high-breakdown regression estimators and a new algorithm. Journal of the American Statistical Association 97 (2002) 136-148
Horst R., and Tuy H. Global optimization. Deterministic Approaches. 3rd ed. (1995), Springer
Johnson R.A., and Wichern D.W. Applied Multivariate Statistical Analysis. 3rd ed. (1992), Prentice-Hall
Pardalos P.M., and Rosen J.B. Constrained Global Optimization: Algorithms and Applications. Lecture Notes in Computer Science (1987), Springer-Verlag, New York
Peña D., and Prieto F.J. Multivariate outlier detection and robust covariance matrix estimation. Journal of the American Statistical Association 43 (2001) 286-303
Rousseeuw P.J. Multivariate estimation with high breakdown point. In: Grossmann W., Pflug G., Vincze I., and Wertz W. (Eds). Mathematical Statistics and Applications vol. B (1985), Dordrecht, Reidel 283-297
Rousseeuw P.J., and Leroy A.M. Robust Regression and Outlier Detection (1987), John Wiley, New York
Rousseeuw P.J., and Van Driessen K. A fast algorithm for the minimum covariance determinant estimator. Technometrics 41 (1999) 212-223
Todorov V. Computing the minimum covariance determinant estimator (MCD) by simulated annealing. Computational Statistics and Data Analysis 14 (1992) 515-525
Woodruff D.L., and Rocke D.M. Computable robust estimation of multivariate location and shape in high dimension using compound estimators. Journal of the American Statistical Association 89 (1994) 888-896