No full text
Unpublished conference/Abstract (Scientific congresses and symposiums)
Clustering Algorithm in Presence of Missing Data
Nekoee Zahraei, Halehsadat; LOUIS, Renaud; Donneau, Anne-Françoise
2019Royal Statistical Society of Belgium (RSSB)
 

Files


Full Text
No document available.

Send to



Details



Keywords :
Missing data; Multiple Imputation; Cluster Analysis; Multivariate Multinomial Model; EM Algorithm
Abstract :
[en] Introduction Clustering analysis is the well-known method for exploring similarity between patients. Recently, in Chronic obstructive pulmonary disease (COPD), attempts have been made to identify homogeneous phenotypes for COPD. A primary and unavoidable problem in such clinical research is missing data. There exist several methods to deal with missing data problem. Multiple imputation (MI) is widely used for handling missing data and is supported by many statistical packages. However, finding the best clustering after application of multiple imputation is a difficult problem. Objective In this study, we propose a procedure for clustering on huge dataset with missing values. The main focus of this project is to introduce a new practical algorithm to derive a single clustering solution for dataset in which missing values were imputed multiple times. Method The first step of this algorithm consists in applying a multiple imputation technique. COPD is a multi-dimensional disease with large number of discrete and continuous variables. Therefore, factor analysis of mixed data (FAMD) was used for reducing the complexity of high-dimensional data. In the next step, several methods (k-means, hierarchical and model-based) were applied to cluster imputed datasets. Combine multiple clustering results into a single solution is an important and statistical challenge. Our proposal for pooling the clustering results derived from each imputed dataset was based on maximum likelihood of multivariate multinomial mixture model based on EM algorithm. The obtained results were then compared to other methods (i.e. majority vote and fuzzy k-means). The main difficulty in this procedure was that the cluster analysis involved many technical decisions, therefore, various algorithms can be defined and compared. Results Simulation studies were conducted to illustrate the usefulness of our methodology against commonly used alternative models. Also, the practicality was investigated by analysing data from the Pneumology Department of the University hospital of Liege, which aimed to identify clinical phenotypes among adults suffering from COPD. Conclusions In conclusion, our proposed procedure is very practical and flexible to allow the user to compare several methods in clustering and merging step.
Disciplines :
Public health, health care sciences & services
Author, co-author :
Nekoee Zahraei, Halehsadat ;  Université de Liège - ULiège > Département des sciences de la santé publique > Biostatistique
LOUIS, Renaud ;  Centre Hospitalier Universitaire de Liège - CHU > Département des Services Logistiques > Secteur gardiennage
Donneau, Anne-Françoise ;  Université de Liège - ULiège > Département des sciences de la santé publique > Biostatistique
Language :
English
Title :
Clustering Algorithm in Presence of Missing Data
Publication date :
16 October 2019
Event name :
Royal Statistical Society of Belgium (RSSB)
Event date :
from 16-10-2019 to 18-10-2019
Available on ORBi :
since 28 October 2019

Statistics


Number of views
135 (14 by ULiège)
Number of downloads
0 (0 by ULiège)

Bibliography


Similar publications



Contact ORBi