Clustering Algorithm in Presence of Missing Data

Nekoee Zahraei, Halehsadat; LOUIS, Renaud; Donneau, Anne-Françoise

No full text

Unpublished conference/Abstract (Scientific congresses and symposiums)

Clustering Algorithm in Presence of Missing Data

Nekoee Zahraei, Halehsadat; LOUIS, Renaud; Donneau, Anne-Françoise

2019 • Royal Statistical Society of Belgium (RSSB)

Permalink
https://hdl.handle.net/2268/240652

Files (0)Send to Details Statistics Bibliography Similar publications

Files

Full Text

No document available.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Missing data; Multiple Imputation; Cluster Analysis; Multivariate Multinomial Model; EM Algorithm

Abstract :

[en] Introduction Clustering analysis is the well-known method for exploring similarity between patients. Recently, in Chronic obstructive pulmonary disease (COPD), attempts have been made to identify homogeneous phenotypes for COPD. A primary and unavoidable problem in such clinical research is missing data. There exist several methods to deal with missing data problem. Multiple imputation (MI) is widely used for handling missing data and is supported by many statistical packages. However, finding the best clustering after application of multiple imputation is a difficult problem. Objective In this study, we propose a procedure for clustering on huge dataset with missing values. The main focus of this project is to introduce a new practical algorithm to derive a single clustering solution for dataset in which missing values were imputed multiple times. Method The first step of this algorithm consists in applying a multiple imputation technique. COPD is a multi-dimensional disease with large number of discrete and continuous variables. Therefore, factor analysis of mixed data (FAMD) was used for reducing the complexity of high-dimensional data. In the next step, several methods (k-means, hierarchical and model-based) were applied to cluster imputed datasets. Combine multiple clustering results into a single solution is an important and statistical challenge. Our proposal for pooling the clustering results derived from each imputed dataset was based on maximum likelihood of multivariate multinomial mixture model based on EM algorithm. The obtained results were then compared to other methods (i.e. majority vote and fuzzy k-means). The main difficulty in this procedure was that the cluster analysis involved many technical decisions, therefore, various algorithms can be defined and compared. Results Simulation studies were conducted to illustrate the usefulness of our methodology against commonly used alternative models. Also, the practicality was investigated by analysing data from the Pneumology Department of the University hospital of Liege, which aimed to identify clinical phenotypes among adults suffering from COPD. Conclusions In conclusion, our proposed procedure is very practical and flexible to allow the user to compare several methods in clustering and merging step.

Disciplines :

Public health, health care sciences & services

Author, co-author :

Nekoee Zahraei, Halehsadat ; Université de Liège - ULiège > Département des sciences de la santé publique > Biostatistique

LOUIS, Renaud ; Centre Hospitalier Universitaire de Liège - CHU > Département des Services Logistiques > Secteur gardiennage

Donneau, Anne-Françoise ; Université de Liège - ULiège > Département des sciences de la santé publique > Biostatistique

Language :

English

Title :

Clustering Algorithm in Presence of Missing Data

Publication date :

16 October 2019

Event name :

Royal Statistical Society of Belgium (RSSB)

Event date :

from 16-10-2019 to 18-10-2019

Available on ORBi :

since 28 October 2019

Statistics

Number of views

145 (14 by ULiège)

Number of downloads

0 (0 by ULiège)

More statistics

See more details

Name	Provider / Domaine	Expiration	Description
JSESSIONID	Oracle Corporation www.uliege.be	Session	General purpose platform session cookie, used by sites written in JSP. Usually used to maintain an anonymous user session by the server.
CookieScriptConsent	CookieScript .uliege.be	1 year	This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.

Name	Provider / Domaine	Expiration	Description
_pk_id	InnoCraft Ltd .uliege.be	1 year	Used to store a few details about the user such as the unique visitor ID
_pk_ses	InnoCraft Ltd .uliege.be	30 minutes	Short lived cookies used to temporarily store data for the visit
_pk_ref	InnoCraft Ltd .uliege.be	6 months	Used to store the attribution information, the referrer initially used to visit the website