On the breakdown behavior of TCLUST clustering procedure

Ruwet, Christel; Garcia-Escudero, Luis Angel; Gordaliza, Alfonso; Mayo-Iscar, Agustin

doi:10.1007/s11749-012-0312-4

Request a copy

Article (Scientific journals)

On the breakdown behavior of TCLUST clustering procedure

Ruwet, Christel; Garcia-Escudero, Luis Angel; Gordaliza, Alfonso et al.

2013 • In TEST, 22 (3), p. 466-487

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/2268/104215

DOI
10.1007/s11749-012-0312-4

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

On the breakdown behavior of TCLUST clustering procedure_preprint.pdf

Author preprint (391.82 kB)

Request a copy

The final publication is available at www.springerlink.com

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Breakdown point; Clustering; Robustness; TCLUST; Trimming

Abstract :

[en] Clustering procedures allowing for general covariance structures of the obtained clusters need some constraints on the solutions. With this in mind, several proposals have been introduced in the literature. The TCLUST procedure works with a restriction on the "eigenvalues-ratio" of the clusters scatter matrices. In order to try to achieve robustness with respect to outliers, the procedure allows to trim off a proportion of the most outlying observations. The resistance to infinitesimal contamination of the TCLUST has already been studied. This paper aims to look at its resistance to a higher amount of contamination by means of the study of its breakdown behavior. The rather new concept of restricted breakdown point will demonstrate that the TCLUST procedure resists to a proportion of contamination equal to the trimming rate as soon as the data set is sufficiently "well clustered".

Disciplines :

Mathematics

Author, co-author :

Ruwet, Christel ; Université de Liège - ULiège > Département de mathématique > Statistique mathématique

Garcia-Escudero, Luis Angel; Universidad de Valladolid > Departamento de estadística e investigación operativa

Gordaliza, Alfonso; Universidad de Valladolid > Departamento de estadística e investigación operativa

Mayo-Iscar, Agustin; Universidad de Valladolid > Departamento de estadística e investigación operativa

Language :

English

Title :

On the breakdown behavior of TCLUST clustering procedure

Publication date :

August 2013

Journal title :

TEST

ISSN :

1133-0686

eISSN :

1863-8260

Publisher :

Springer, Heidelberg, Germany

Volume :

Issue :

Pages :

466-487

Peer reviewed :

Peer Reviewed verified by ORBi

Funders :

The Spanish Ministerio de Ciencia y Tecnología and the FEDER grant MTM2011-28657-C02-01

Available on ORBi :

since 29 November 2011

Statistics

Number of views

215 (16 by ULiège)

Number of downloads

17 (5 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed k-means: an attempt to robustify quantizers. Ann Stat 25: 553-576.
Dennis JE Jr. (1982) Algorithms for nonlinear fitting. In: Nonlinear optimization, Cambridge, 1981. Academic Press, London, pp 67-78.
Donoho D, Huber PJ (1983) The notion of breakdown point. In: A festschrift for Erich L. Lehmann. Wadsworth, Belmont, pp 157-184.
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97: 611-631.
Gallegos MT, Ritter G (2005) A robust method for cluster analysis. Ann Stat 33: 347-380.
Gallegos MT, Ritter G (2009a) Trimmed ML estimation of contaminated mixtures. Sankhyā 71: 164-220.
Gallegos MT, Ritter G (2009b) Trimming algorithms for clustering contaminated grouped data and their robustness. Adv Data Anal Classif 3: 135-167.
García-Escudero LA, Gordaliza A (1999) Robustness properties of k means and trimmed k means. J Am Stat Assoc 94: 956-969.
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36: 1324-1345.
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4: 89-109.
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2011) Exploring the number of groups in robust model-based clustering. Stat Comput 21: 585-599.
Genton MG, Lucas A (2003) Comprehensive definitions of breakdown points for independent and dependent observations. J R Stat Soc, Ser B, Stat Methodol 65: 81-94.
Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13: 795-800.
Hennig C (2008) Dissolution point and isolation robustness: robustness criteria for general cluster analysis methods. J Multivar Anal 99: 1154-1176.
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley-Interscience, New York.
McLachlan G, Peel D (2000) Finite mixture models. Wiley-Interscience, New York.
Neykov N, Filzmoser P, Dimova R, Neytchev P (2007) Robust fitting of mixtures using the trimmed likelihood estimator. Comput Stat Data Anal 52: 299-308.
Ruwet C, García-Escudero LA, Gordaliza A, Mayo-Iscar A (2012) The influence function of the TCLUST robust clustering procedure. Adv Data Anal Classif 6: 107-130.
Zhong S, Ghosh J (2004) A unified framework for model-based clustering. J Mach Learn Res 4: 1001-1037.