Doctoral thesis (Dissertations and theses)
Genomic Association Screening Methodology for High-Dimensional and Complex Data Structures: Detecting n-Order Interactions
Mahachie John, Jestinah


Full Text
Publisher postprint (4.89 MB)

All documents in ORBi are protected by a user license.

Send to


Abstract :
[en] We developed a data-mining method, Model-Based Multifactor Dimensionality Reduction (MB-MDR) to detect epistatic interactions for different types of traits. MB-MDR enables the fast identification of gene-gene interactions among 1000nds of SNPs, without the need to make restrictive assumptions about the genetic modes of inheritance. This thesis primarily focused on applying Model-Based Multifactor Dimensionality Reduction for quantitative traits, its performance and application to a variety of data problems. We carried out several simulation studies to evaluate quantitative MB-MDR in terms of power and type I error, when data are noisy, non-normal or skewed and when important main effects are present. Firstly, we assessed the performance of MB-MDR in the presence of noisy data. The error sources considered were missing genotypes, genotyping error, phenotypic mixtures and genetic heterogeneity. Results from this study showed that MB-MDR is least affected by presence of small percentages of missing data and genotyping errors but much affected in the presence of phenotypic mixtures and genetic heterogeneity. This is in line with a similar study performed for binary traits. Although both Multifactor Dimensionality Reduction (MDR) and MB-MDR are data reduction techniques with a common basis, their ways of deriving significant interactions are substantially different. Nevertheless, effects on power of introducing error sources were quite similar. Irrespective of the trait under consideration, epistasis screening methodologies such as MB-MDR and MDR mainly suffer from the presence of phenotypic mixtures and genetic heterogeneity. Secondly, we extensively addressed the issue of adjusting for lower-order genetic effects during epistasis screening, using different adjustment strategies for SNPs in the functional SNP-SNP interaction pair, and/or for additional important SNPs. Since, in this thesis, we restrict attention to 2-locus interactions only, adjustment for lower-order effects always (and only) implies adjustment for main genetic effects. Unfortunately most data dimensionality reduction techniques based on MDR do not explicitly require that lower-order effects are included in the ‘model’ when investigating higher-order effects (a prerequisite for most traditional, especially regression-based, methods). However, epistasis results may be hampered by the presence of significant lower-order effects. Results from this study showed hugely increased type I errors when main effects were not taken into account or were not properly accounted for. We observed that additive coding (the most commonly used coding in practice) in main effects adjustment does not remove all of the potential main effects that deviate from additive genetic variance. In addition, also adjusting for main effects prior to MB-MDR (via a regression framework), whatever coding is adopted, does not control type I error in all scenarios. From this study, we concluded that correction for lower-order effects should preferentially be done via codominant coding, to reduce the chance of false positive epistasis findings. The recommended way of performing an MB-MDR epistasis screening is to always adjust the analysis for lower-order effects of the SNPs under investigation, “on-the-fly”. This correction avoids overcorrection for other SNPs, which are not part of the interacting SNP pair under study. Thirdly, we assessed the cumulative effect of trait deviations from normality and homoscedasticity on the overall performance of quantitative MB-MDR to detect 2-locus epistasis signals in the absence of main effects. Although MB-MDR itself is a non-parametric method, in the sense that no assumptions are made regarding genetic modes of inheritance, the data reduction part in MB-MDR relies on association tests. In particular, for quantitative traits, the default MB-MDR way is to use the Student’s t-test (steps 1 and 2 of MB-MDR). Also when correcting for lower-order effects during quantitative MB-MDR analysis, we intrinsically maneuver within a regression framework. Since the Student’s t-statistic is the square root of the ANOVA F-statistic. Hence, along these lines, for MB-MDR to give valid results, ANOVA assumptions have to be met. Therefore, we simulated data from normal and non-normal distributions, with constant and non-constant variances, and performed association tests via the student’s t-test as well as the unequal variance t-test, commonly known as the Welch’s t-test. At first somewhat surprising, the results of this study showed that MB-MDR maintains adequate type I errors, irrespective of data distribution or association test used. On the other hand, MB-MDR give rise to lower power results for non-normal data compared to normal data. With respect to the association tests used within MB-MDR, in most cases, Welch’s t-test led to lower power compared to student’s t-test. To maintain the balance between power and type I error, we concluded that when performing MB-MDR analysis with quantitative traits, one ideally first rank-transforms traits to normality and then applies MB-MDR modeling with Student’s t-test as choice of association test. Clearly, before embarking on using a method in practice, there is a need to extensively check the applicability of the method to the data at hand. This is a common practice in biostatistics, but often a forgotten standard operating procedure in genetic epidemiology, in particular in GWAI studies. In addition to the presentation of extensive simulation studies, we also presented some MB-MDR applications to real-life data problems. These analyses involved MB-MDR analyses on quantitative as well as binary complex disease traits, primarily in the context of asthma/allergy and Crohn’s disease. In two of the presented analyses, MB-MDR confirmed logistic regression and transmission disequilibrium test (TDT) results. Part of the aforementioned methodological developments was initiated on the basis of observations of MB-MDR behavior on real-life data. Both the practical and theoretical components of this thesis confirm our belief in the potential of MB-MDR as a promising and versatile tool for the identification of epistatic effects, irrespective of the design (family-based or unrelated individuals) and irrespective of the targeted disease trait (binary, continuous, censored, categorical, multivariate). A thorough characterization of the different faces of MB-MDR this versatility gives rise to is work in progress.
Disciplines :
Computer science
Author, co-author :
Mahachie John, Jestinah ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Bioinformatique
Language :
Title :
Genomic Association Screening Methodology for High-Dimensional and Complex Data Structures: Detecting n-Order Interactions
Defense date :
20 December 2012
Institution :
ULiège - Université de Liège
Degree :
Doctorale en sciences de l'ingenieur (élec & électro)
Available on ORBi :
since 11 December 2012


Number of views
319 (21 by ULiège)
Number of downloads
480 (18 by ULiège)


Similar publications

Contact ORBi