No document available.
Abstract :
[en] Cluster analysis is a set of multivariate procedures to detect natural groupings in data. The objective of those methods is to group a set of objects in such a way that objects in the same group (called cluster) are more similar to each other than to those in other groups. Organizing data into sensible groupings arises naturally in many scientific fields as psychology, biology, statistics, bioinformatics, marketing, and so on.
However, the obtained solution is not unique and it strongly depends upon the analyst’s choices. Representation and normalization scheme, selection of distance measures and a clustering algorithm, choice of the number of clusters and their interpretations are all subjective choices which change the final output. Those decisions are mainly guided by the purpose of grouping, domain knowledge and the individual data set. Therefore, cluster validity assessment should be performed to evaluate the validity of the obtained clusters and to find the partitioning that best fits the underlying data.
I provide a brief overview of clustering, summarize well known algorithms, and discuss the major challenges and key issues in performing clustering analysis.