Doctoral thesis (Dissertations and theses)
An efficient and flexible software tool for genome-wide association interaction studies
Van Lishout, François
2016
 

Files


Full Text
vanlishout_phd.pdf
Author postprint (4.79 MB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Abstract :
[en] Humans are made up of approximately 3.2 billion base pairs, out of which about 62 million can vary from one individual to another. These particular base pairs are called single nucleotide polymorphisms (SNPs). It is well known that some particular combination of SNP values increase dramatically the risk of contracting certain type of disease, like Crohn's disease, Alzheimer, diabetes and cancer, just to name a few. However, there are still a lot of new discoveries to make and specialized software is required for this task. It has been shown that individual SNPs cannot account for much of the heritability on their own. Therefore, this PhD thesis is dedicated to interaction studies, the purpose of which is to identify pairs of SNPs and/or environmental factors that might regulate the susceptibility to the disease under investigation. Model-Based Multifactor Dimensionality Reduction (MB-MDR) is a powerful and flexible methodology to perform interaction analysis, while minimizing the amount of false discoveries. Before this thesis, the only available implementation was an R-package taking days to analyze a dataset composed of just hundred of SNPs. However, a typical dataset contains hundreds of thousands or millions of SNPs, even after data cleaning and quality control. The aim of this thesis is to write a software able to analyze such datasets within a few days with the MB-MDR methodology. In other words, the goal is to get 10^8 times faster than the R-package, while still remaining powerful, flexible and keeping the amount of false discoveries low. Several contributions were needed to reach this goal and are presented in this thesis. First, a new software was written from scratch in C++, in order to be able to optimize every single computation, instead of relying on too generic functions as was the case for the R-package. Second, the methodology itself was improved, irrespective of the programming language. Indeed, MB-MDR is based on the maxT algorithm (introduced by Westfall&Young in 1993) to assess significance of the results and it can be customized for interaction analysis. A first major contribution of this PhD work, called Van Lishout's implementation of maxT, was introduced in 2011. The parallel version of this algorithm enables to analyze a dataset composed of hundred thousands of SNPs within a few days. The most important contribution of this thesis, called the gammaMAXT algorithm, was introduced in 2014. The parallel version enables to analyze a dataset composed of one million SNPs within one day. In this thesis, we also propose a new viewpoint to handle population stratification and correct for covariates. Many simulated and real-life data analysis are provided, to highlight the flexibility of the software and its ability to find interesting results from a biological point of view. The latest version, called mbmdr-4.4.1.out, can be downloaded freely at http://www.statgen.ulg.ac.be with the corresponding documentation.
Research center :
Giga-Genetics - ULiège
Disciplines :
Engineering, computing & technology: Multidisciplinary, general & others
Author, co-author :
Van Lishout, François ;  Université de Liège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Dép. d'électric., électron. et informat. (Inst.Montefiore)
Language :
English
Title :
An efficient and flexible software tool for genome-wide association interaction studies
Defense date :
14 June 2016
Institution :
ULiège - Université de Liège
Degree :
Doctor of Philosophy in Engineering Sciences
Promotor :
Van Steen, Kristel  ;  Université de Liège - ULiège > GIGA > GIGA Medical Genomics - Biostatistics, biomedicine and bioinformatics
President :
Boigelot, Bernard  ;  Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Jury member :
Wehenkel, Louis  ;  Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Farnir, Frédéric  ;  Université de Liège - ULiège > Fundamental and Applied Research for Animals and Health (FARAH)
König, Inke
van der Spek, Peter
Available on ORBi :
since 07 June 2016

Statistics


Number of views
268 (47 by ULiège)
Number of downloads
407 (28 by ULiège)

Bibliography


Similar publications



Contact ORBi