No full text
Unpublished conference/Abstract (Scientific congresses and symposiums)
Cluster Analysis in Incomplete Data
Nekoee Zahraei, Halehsadat; Louis, Renaud; Donneau, Anne-Françoise
202041st Annual Conference of the International Society for Clinical Biostatistics
 

Files


Full Text
No document available.

Send to



Details



Abstract :
[en] Background One of the most applied methods which help researchers to find a structure in their dataset is cluster analysis. In cluster analysis, multi-dimensional data are divided into homogeneous groups such that subjects in each group have similar properties. However, missing value is an unavoidable part of multi-dimensional data. Even if missing data can now be easily handled by several methods, clustering approaches have to account for this management of missing values. Objective Multiple imputation is a simple but powerful method in this field. However, there are several challenges for clustering when multiple imputation is applied. The objective of this present research was to introduce an efficient framework to apply cluster analysis on incomplete dataset by using multiple imputation. By simulating different scenarios inspired by real data, our proposed method addressed some limitations in statistical literature to find high discriminating clusters. Method In the first step of multiple imputation, m imputed datasets were generated. Variable reduction methods and cluster analysis strategies were then applied to imputed datasets. Finally, for each imputed dataset, cluster assignment was calculated. For that purpose, application of finite mixture of multivariate multinomial distribution was proposed to estimate number of clusters; final cluster result was assigned to observations by solving maximum likelihood via EM algorithm. Results Motivated by real datasets, 178 subjects with mixed continuous and categorical variables but with two known clusters were generated by normal and multivariate mixture distribution, respectively. Several scenarios were defined for different percentages of missingness (e.g. 25%, 50%, 75%) and overlap between two known clusters (e.g. 30%, 45%, 65%). In addition, different imputation, variable reduction and clustering methods were compared. The results showed that our proposed method had high discrimination and matching compared to other methods. The best method, based on multiple imputation, variable reduction and our proposed combination method, was then applied on real data from the Pneumology Department of the University hospital of Liege, which aimed to identify clinical phenotypes among adults suffering from Chronic obstructive pulmonary disease. Conclusions Based on large simulation study, our proposed method yielded to the best discrimination with the highest matching between the final result of clustering and the known clustering from the simulated dataset.
Disciplines :
Public health, health care sciences & services
Author, co-author :
Nekoee Zahraei, Halehsadat ;  Université de Liège - ULiège > Département des sciences de la santé publique > Biostatistique
Louis, Renaud;  Université de Liège - ULiège > Département des sciences cliniques > Pneumologie - Allergologie
Donneau, Anne-Françoise ;  Université de Liège - ULiège > Département des sciences de la santé publique > Biostatistique
Language :
English
Title :
Cluster Analysis in Incomplete Data
Publication date :
August 2020
Event name :
41st Annual Conference of the International Society for Clinical Biostatistics
Event place :
Krakow, Poland
Event date :
23-08-2020 to 27-08-2020
Audience :
International
Available on ORBi :
since 18 November 2020

Statistics


Number of views
67 (8 by ULiège)
Number of downloads
0 (0 by ULiège)

Bibliography


Similar publications



Contact ORBi