[en] The TCLUST procedure performs robust
clustering with the aim of finding clusters with different scatter
structures and proportions. An Eigenvalue Ratio constraint is considered by TCLUST in order to avoid finding spurious clusters. In order to guarantee the robustness of the method against the presence
of outliers and background noise, the method allows for trimming of a
given proportion of observations self determined by the data.
This article studies robustness properties of the TCLUST procedure
by means of the influence function, obtaining a robustness behavior
close to that of the trimmed k-means.
Disciplines :
Mathematics
Author, co-author :
Ruwet, Christel ; Université de Liège - ULiège > Département de mathématique > Statistique mathématique
García-Escudero, Luis Angel; Universidad deValladolid - UVa
Gordaliza, Alfonso; Universidad deValladolid - UVa
Mayo-Iscar, Agustin; Universidad deValladolid - UVa
Language :
English
Title :
The influence function of the TCLUST robust clustering procedure
Publication date :
2012
Journal title :
Advances in Data Analysis and Classification
ISSN :
1862-5347
eISSN :
1862-5355
Publisher :
Springer, Germany
Volume :
6
Issue :
2
Pages :
107-130
Peer reviewed :
Peer Reviewed verified by ORBi
Funders :
Spanish Ministerio de Ciencia e Innovación FWB - Fédération Wallonie-Bruxelles
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.
Bibliography
Croux C, Filzmoser P, Joossens K (2008) Classification efficiencies for robust linear discriminant analysis. Stat Sin 18(2): 581-599.
Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed k-means: an attempt to robustify quantizers. Ann Stat 25(2): 553-576.
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458): 611-631.
Gallegos MT (2001) Robust clustering under general normal assumptions. Technical Report MIP-0103, Fakultät für Mathematik und Informatik, Universität Passau.
Gallegos MT (2002) Maximum likelihood clustering with outliers. In: Classification, clustering, and data analysis (Cracow, 2002). Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 247-255.
Gallegos MT, Ritter G (2005) A robust method for cluster analysis. Ann Stat 33(1): 347-380.
Gallegos MT, Ritter G (2009) Trimming algorithms for clustering contaminated grouped data and their robustness. Adv Data Anal Classif 3(2): 135-167.
García-Escudero LA, Gordaliza A (1999) Robustness properties of k means and trimmed k means. J Am Stat Assoc 94(447): 956-969.
García-Escudero LA, Gordaliza A (2007) The importance of the scales in heterogeneous robust clustering. Comput Stat Data Anal 51(9): 4403-4412.
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36(3): 1324-1345.
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4: 89-109.
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2011) Exploring the number of groups in robust model-based clustering. Stat Comput 21: 585-599.
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics. The approach based on influence functions. Wiley series in probability and mathematical statistics: probability and mathematical statistics. Wiley, New York.
Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13(2): 795-800.
Luenberger DG, Ye Y (2008) Linear and nonlinear programming. In: International series in operations research and management science, vol 116, 3rd edn. Springer, New York.
McLachlan G, Peel D (2000) Finite mixture models. Wiley series in probability and statistics: applied probability and statistics. Wiley-Interscience, New York.
Pison G, van Aelst S (2004) Diagnostic plots for robust multivariate methods. J Comput Graph Stat 13(2): 310-329.
Rousseeuw P, van Zomeren B (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85: 633-651.
Ruwet C, Haesbroeck G (2011) Impact of contamination on training and test error rates in statistical clustering analysis. Commun Stat Simul Comput 40: 394-411.
Zhong S, Ghosh J (2004) A unified framework for model-based clustering. J Mach Learn Res 4(6): 1001-1037.
Similar publications
Sorry the service is unavailable at the moment. Please try again later.
This website uses cookies to improve user experience. Read more
Save & Close
Accept all
Decline all
Show detailsHide details
Cookie declaration
About cookies
Strictly necessary
Performance
Strictly necessary cookies allow core website functionality such as user login and account management. The website cannot be used properly without strictly necessary cookies.
This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.
Performance cookies are used to see how visitors use the website, eg. analytics cookies. Those cookies cannot be used to directly identify a certain visitor.
Used to store the attribution information, the referrer initially used to visit the website
Cookies are small text files that are placed on your computer by websites that you visit. Websites use cookies to help users navigate efficiently and perform certain functions. Cookies that are required for the website to operate properly are allowed to be set without your permission. All other cookies need to be approved before they can be set in the browser.
You can change your consent to cookie usage at any time on our Privacy Policy page.