Characterization of neurodegenerative diseases with tree ensemble methods: the case of Alzheimer's disease

Wehenkel, Marie

Download

Doctoral thesis (Dissertations and theses)

Characterization of neurodegenerative diseases with tree ensemble methods: the case of Alzheimer's disease

Wehenkel, Marie

2018

Permalink
https://hdl.handle.net/2268/227796

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

Thesis.pdf

Publisher postprint (15.48 MB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Machine learning; Alzheimer's disease; CAD systems; Tree ensemble methods; Random Forests; Group selection

Abstract :

[en] For the last decade, the neuroscience field has observed the emergence of machine learning methods for the analysis of neuroimaging data. Unlike univariate methods that consider voxels one per one, these techniques analyse relationships between several voxels and are able to detect multivariate patterns. In the context of neurodegenerative diseases, such as Alzheimer’s disease (AD), they can be used to design a diagnosis system and to find in neuroimages the patterns responsible for the disease. The context of the work presented here is thus the field of pattern recognition with neuroimaging. Our objective is to explore the possibilities that tree ensemble methods, such as Random Forests, offer in this domain in general, and in particular in the context of AD research. These methods suit very well the needs of this domain, as they combine very good predictive performances and provide interpretable results in the form of variable importance scores. Our contributions include both methodological developments around tree ensemble methods and applications of these methods on real datasets. The methodological part of the thesis focuses on the analysis and the improvement of Random Forests variable importances for neuroimaging problems. Typical datasets in this domain are of very high dimensionality (hundreds of thousands of voxels) and contain comparatively very few samples (tens or hundreds of patients). Our first contribution is a theoretical and empirical analysis of how importance scores behave in such extreme settings, depending on the method parameters. We then propose several improvements of importance scores in such settings that take advantage of either the spatial structure between the features or a pre-defined partitioning of these features into groups. Finally, we address an issue with Random Forests importances, which is to find a threshold between truly relevant and irrelevant variables. For this purpose, we adapt several statistical methods proposed in the bioinformatics literature. These methods are extended to compute a statistical score for groups of features instead of individual features. This adaptation at the group level has been raised from our expectation to find groups of voxels explaining a disease instead of isolated voxels. We show that working at the group level leads to a higher statistical power than working at the feature level. The approach is applied on a real dataset for the prognosis of AD, where it is shown to highlight brain regions that are consistent with results in the literature. In the second part of the thesis, we show different applications of Random Forests for AD research. First, we use tree-based ensemble methods in order to clinically characterize two different metabolic profiles observed in PET scans of AD patients. Second, we carry out an empirical comparison that shows that Random Forests are competitive with linear methods, in terms of accuracy and interpretability, on different real datasets related to three research questions about AD: the diagnosis of demented patients, the prognosis of mild cognitively impaired (MCI) patients, and the differentiation of MCI and AD patients.

Research Center/Unit :

GIGA CRC (Cyclotron Research Center) In vivo Imaging-Aging & Memory - ULiège

Disciplines :

Engineering, computing & technology: Multidisciplinary, general & others

Author, co-author :

Wehenkel, Marie ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

English

Title :

Characterization of neurodegenerative diseases with tree ensemble methods: the case of Alzheimer's disease

Defense date :

17 September 2018

Number of pages :

146 + 25

Institution :

ULiège - Université de Liège

Degree :

Docteur en Sciences de l'ingénieur

Promotor :

Phillips, Christophe ; Université de Liège - ULiège > GIGA > GIGA CRC In vivo Imaging - Neuroimaging, data acquisition and processing

Geurts, Pierre ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

President :

Ernst, Damien ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Jury member :

Bastin, Christine ; Université de Liège - ULiège > GIGA > GIGA CRC In vivo Imaging - Aging & Memory

Louppe, Gilles ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Big Data

Saeys, Yvan

Bzdok, Danilo

Funders :

F.R.S.-FNRS - Fonds de la Recherche Scientifique

Available on ORBi :

since 14 September 2018

Statistics

Number of views

578 (71 by ULiège)

Number of downloads

666 (38 by ULiège)

More statistics