No full text
Unpublished conference/Abstract (Scientific congresses and symposiums)
Using multiple imputation for cluster analysis with incomplete data
Nekoee Zahraei, Halehsadat; LOUIS, Renaud; Donneau, Anne-Françoise
2020Colloquium Methodology and Statistics,
 

Files


Full Text
No document available.

Send to



Details



Keywords :
Cluster analysis; Missing values; Consensus clustering
Abstract :
[en] Cluster analysis is one of the most applied methods which help researchers to find a structure in their dataset by dividing observations into necessary homogeneous groups such that observations in each group have similar properties and different characteristics in separate groups. A general and non-avoidable challenge in real-world data analysis, which is not only specific to cluster analysis field, is missing values. A combination of method to deal with missing data and cluster analysis is a new challenge in these two popular fields. Despite traditional methods for excluding or single imputing missing values, multiple imputation is an optimal method for imputing missing values that has complexities when apply with cluster analysis. Indeed, according to Rubin’s rule, a consensus clustering method is required in the last step of multiple imputation process to combine results of applying cluster analysis on imputed datasets. In addition, the number of variables included in the cluster analysis is an important and influential issue in this field. Therefore, the present study attempt to introduce a new framework for cluster analysis, combining multiple imputation and variable reduction by proposing a new model based on mixture multivariate multinomial model (4M) to solve the problem in consensus step. Although many challenges are in handling missing value and variable reduction in cluster analysis, there is no comprehensive study to compare the ability of classification in each step for (1) handling missing values, (2) dimension reduction, (3) general clustering methods, and (4) consensus clustering. In the second aim of this study, using simulated datasets under various scenarios, the discriminating power of the proposed framework is evaluated and compared with commonly existed methods for missing values, variable reduction, and consensus clustering. Finally, the introduced framework was applied to data from patients suffering from chronic obstructive pulmonary disease recruited in the Pneumology Department of the University Hospital of Liege.
Disciplines :
Mathematics
Author, co-author :
Nekoee Zahraei, Halehsadat ;  Université de Liège - ULiège > Département des sciences de la santé publique > Biostatistique
LOUIS, Renaud;  Université de Liège - ULiège > Département des sciences cliniques > Pneumologie - Allergologie
Donneau, Anne-Françoise ;  Université de Liège - ULiège > Département des sciences de la santé publique > Biostatistique
Language :
English
Title :
Using multiple imputation for cluster analysis with incomplete data
Publication date :
17 November 2020
Event name :
Colloquium Methodology and Statistics,
Event organizer :
Maastricht University
Event date :
17 november 2020
By request :
Yes
Audience :
International
Available on ORBi :
since 18 November 2020

Statistics


Number of views
102 (10 by ULiège)
Number of downloads
0 (0 by ULiège)

Bibliography


Similar publications



Contact ORBi