Abstract :
[en] The Minimum Covariance Determinant (MCD) estimator is a highly robust procedure for estimating the
center and shape of a high dimensional data set. It consists of determining a subsample of h points out
of n which minimizes the generalized variance. By definition, the computation of this estimator gives rise
to a combinatorial optimization problem, for which several approximative algorithms have been developed.
Some of these approximations are quite powerful, but they do not take advantage of any smoothness in the
objective function. In this paper, focus is on the approach outlined in a general framework in Critchley et
al. (2009) and which transforms any discrete and high dimensional combinatorial problem of this type into a
continuous and low-dimensional one. The idea is to build on the general algorithm proposed by Critchley et
al. (2009) in order to take into account the particular features of the MCD methodology. More specifically,
both the adaptation of the algorithm to the specific MCD target function as well as the comparison of this
“specialized” algorithm with the usual competitors for computing MCD are the main goals of this paper. The
adaptation focuses on the design of “clever” starting points in order to systematically investigate the search
domain. Accordingly, a new and surprisingly efficient procedure based on the well known k-means algorithm
is constructed. The adapted algorithm, called RelaxMCD, is then compared by means of simulations and
examples with FASTMCD and the Feasible Subset Algorithm, both benchmark algorithms for computing
MCD. As a by-product, it is shown that RelaxMCD is a general technique encompassing the two others,
yielding insight about their overall good performance.
Scopus citations®
without self-citations
22