[en] The k-means method is used in classification to group similar observations in k groups. When a second sample is available to test the obtained groupings, the rate of misclassification can be computed. If the samples are generated from a mixture of two homoscedastic and spherically symmetric distributions, the rate of misclassification equals that of the Bayes rule. Therefore, the k-means method is optimal under such a mixture model. However, it is not robust with respect to outliers in the dataset used to construct the groups. To avoid this problem, the k-means procedure has been adapted in many ways. This presentation focuses on the trimmed k-means method defined by trimming some of the observations. The advantage of this method, besides its resistance to outliers, is that optimality is preserved. However, it is well known that trimming observations leads to a loss in classification efficiency. The latter can be measured by means of the influence function of the misclassifiation rate.
Disciplines :
Mathematics
Author, co-author :
Ruwet, Christel ; Université de Liège - ULiège > Département de mathématique > Statistique mathématique
Language :
English
Title :
Classification efficiency of the trimmed k-means procedure
Alternative titles :
[fr] Efficacité de classification de la méthode des k-moyennes tronquées