No document available.
Abstract :
[en] Introduction
Clustering analysis is the well-known method for exploring similarity between patients. Recently, in Chronic obstructive pulmonary disease (COPD), attempts have been made to identify homogeneous phenotypes for COPD. A primary and unavoidable problem in such clinical research is missing data. There exist several methods to deal with missing data problem. Multiple imputation (MI) is widely used for handling missing data and is supported by many statistical packages. However, finding the best clustering after application of multiple imputation is a difficult problem.
Objective
In this study, we propose a procedure for clustering on huge dataset with missing values. The main focus of this project is to introduce a new practical algorithm to derive a single clustering solution for dataset in which missing values were imputed multiple times.
Method
The first step of this algorithm consists in applying a multiple imputation technique. COPD is a multi-dimensional disease with large number of discrete and continuous variables. Therefore, factor analysis of mixed data (FAMD) was used for reducing the complexity of high-dimensional data. In the next step, several methods (k-means, hierarchical and model-based) were applied to cluster imputed datasets. Combine multiple clustering results into a single solution is an important and statistical challenge. Our proposal for pooling the clustering results derived from each imputed dataset was based on maximum likelihood of multivariate multinomial mixture model based on EM algorithm. The obtained results were then compared to other methods (i.e. majority vote and fuzzy k-means). The main difficulty in this procedure was that the cluster analysis involved many technical decisions, therefore, various algorithms can be defined and compared.
Results
Simulation studies were conducted to illustrate the usefulness of our methodology against commonly used alternative models. Also, the practicality was investigated by analysing data from the Pneumology Department of the University hospital of Liege, which aimed to identify clinical phenotypes among adults suffering from COPD.
Conclusions
In conclusion, our proposed procedure is very practical and flexible to allow the user to compare several methods in clustering and merging step.