[en] Clustering procedures allowing for general covariance structures of the obtained clusters need some constraints on the solutions. With this in mind, several proposals have been introduced in the literature. The TCLUST procedure works with a restriction on the "eigenvalues-ratio" of the clusters scatter matrices. In order to try to achieve robustness with respect to outliers, the procedure allows to trim off a proportion of the most outlying observations. The resistance to infinitesimal contamination of the TCLUST has already been studied. This paper aims to look at its resistance to a higher amount of contamination by means of the study of its breakdown behavior. The rather new concept of restricted breakdown point will demonstrate that the TCLUST procedure resists to a proportion of contamination equal to the trimming rate as soon as the data set is sufficiently "well clustered".
Disciplines :
Mathematics
Author, co-author :
Ruwet, Christel ; Université de Liège - ULiège > Département de mathématique > Statistique mathématique
Garcia-Escudero, Luis Angel; Universidad de Valladolid > Departamento de estadística e investigación operativa
Gordaliza, Alfonso; Universidad de Valladolid > Departamento de estadística e investigación operativa
Mayo-Iscar, Agustin; Universidad de Valladolid > Departamento de estadística e investigación operativa
Language :
English
Title :
On the breakdown behavior of TCLUST clustering procedure
Publication date :
August 2013
Journal title :
TEST
ISSN :
1133-0686
eISSN :
1863-8260
Publisher :
Springer, Heidelberg, Germany
Volume :
22
Issue :
3
Pages :
466-487
Peer reviewed :
Peer reviewed
Funders :
The Spanish Ministerio de Ciencia y Tecnología and the FEDER grant MTM2011-28657-C02-01
Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed k-means: an attempt to robustify quantizers. Ann Stat 25: 553-576.
Dennis JE Jr. (1982) Algorithms for nonlinear fitting. In: Nonlinear optimization, Cambridge, 1981. Academic Press, London, pp 67-78.
Donoho D, Huber PJ (1983) The notion of breakdown point. In: A festschrift for Erich L. Lehmann. Wadsworth, Belmont, pp 157-184.
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97: 611-631.
Gallegos MT, Ritter G (2005) A robust method for cluster analysis. Ann Stat 33: 347-380.
Gallegos MT, Ritter G (2009a) Trimmed ML estimation of contaminated mixtures. Sankhyā 71: 164-220.
Gallegos MT, Ritter G (2009b) Trimming algorithms for clustering contaminated grouped data and their robustness. Adv Data Anal Classif 3: 135-167.
García-Escudero LA, Gordaliza A (1999) Robustness properties of k means and trimmed k means. J Am Stat Assoc 94: 956-969.
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36: 1324-1345.
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4: 89-109.
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2011) Exploring the number of groups in robust model-based clustering. Stat Comput 21: 585-599.
Genton MG, Lucas A (2003) Comprehensive definitions of breakdown points for independent and dependent observations. J R Stat Soc, Ser B, Stat Methodol 65: 81-94.
Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13: 795-800.
Hennig C (2008) Dissolution point and isolation robustness: robustness criteria for general cluster analysis methods. J Multivar Anal 99: 1154-1176.
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley-Interscience, New York.
McLachlan G, Peel D (2000) Finite mixture models. Wiley-Interscience, New York.
Neykov N, Filzmoser P, Dimova R, Neytchev P (2007) Robust fitting of mixtures using the trimmed likelihood estimator. Comput Stat Data Anal 52: 299-308.
Ruwet C, García-Escudero LA, Gordaliza A, Mayo-Iscar A (2012) The influence function of the TCLUST robust clustering procedure. Adv Data Anal Classif 6: 107-130.
Zhong S, Ghosh J (2004) A unified framework for model-based clustering. J Mach Learn Res 4: 1001-1037.