[en] Background: In Genome-Wide Association Studies (GWAS), the concept of linkage
disequilibrium is important as it allows identifying genetic markers that tag the actual
causal variants. In Genome-Wide Association Interaction Studies (GWAIS), similar
principles hold for pairs of causal variants. However, Linkage Disequilibrium (LD) may
also interfere with the detection of genuine epistasis signals in that there may be
complete confounding between Gametic Phase Disequilibrium (GPD) and interaction.
GPD may involve unlinked genetic markers, even residing on different chromosomes.
Often GPD is eliminated in GWAIS, via feature selection schemes or so-called pruning
algorithms, to obtain unconfounded epistasis results. However, little is known about the
optimal degree of GPD/LD-pruning that gives a balance between false positive control
and sufficient power of epistasis detection statistics. Here, we focus on Model-Based
Multifactor Dimensionality Reduction as one large-scale epistasis detection tool. Its
performance has been thoroughly investigated in terms of false positive control and
power, under a variety of scenarios involving different trait types and study designs, as
well as error-free and noisy data, but never with respect to multicollinear SNPs.
Results: Using real-life human LD patterns from a homogeneous subpopulation of
British ancestry, we investigated the impact of LD-pruning on the statistical sensitivity
of MB-MDR. We considered three different non-fully penetrant epistasis models with
varying effect sizes. There is a clear advantage in pre-analysis pruning using sliding
windows at r2 of 0.75 or lower, but using a threshold of 0.20 has a detrimental effect on
the power to detect a functional interactive SNP pair (power <25%). Signal sensitivity,
directly using LD-block information to determine whether an epistasis signal is present
or not, benefits from LD-pruning as well (average power across scenarios: 87%), but is
largely hampered by functional loci residing at the boundaries of an LD-block.
Conclusions: Our results confirm that LD patterns and the position of causal variants
in LD blocks do have an impact on epistasis detection, and that pruning strategies and
LD-blocks definitions combined need careful attention, if we wish to maximize the
power of large-scale epistasis screenings.
Research center :
GIGA‐R - Giga‐Research - ULiège
Disciplines :
Engineering, computing & technology: Multidisciplinary, general & others
Author, co-author :
Joiret, Marc ; Université de Liège - ULiège > In silico medecine-Biomechanics Research Unit
Mahachie John, Jestinah ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Bioinformatique
Gusareva, Elena
Van Steen, Kristel ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Bioinformatique
Language :
English
Title :
Confounding of linkage disequilibrium patterns in large scale DNA based gene-gene interaction studies
Bush WS, Moore JH. Chapter 11: Genome-wide association studies. PLoS Comput Biol. 2012; 8(12):1-11.
Kirby DA, Muse SV, Stephan W. Maintenance of pre-mRNA secondary structure by epistatic selection. Proc Natl Acad Sci USA. 1995; 92:9047-51.
Schmidt KJ, Törjèk O, Meyer R, Schmuths H, Hoffmann MH, Altmann T. Evidence for a large-scale population structure of Arabidopsis thaliana from genome-wide single nucleotide polymorphism markers. Theor Appl Genet. 2006; 112:1104-14.
Behrouzi P, Wit EC. Detecting epistatic selection with partially observed genotype data by using copula graphical models. Appl Stat. 2019; 68:141-60.
Sabeti, et al.Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002; 419:832-7.
Koch E, Ristroph M. Long Range Linkage Disequilibrium across the Human Genome. PLoS ONE. 2013; 8(12):e80754. https://doi.org/10.1371/journal.pone.0080754.
Lewontin RC, Kojima K. The Evolutionary Dynamics of Complex Polymorphisms. Evolution. 1960; 14(4):458-72.
Walsh B, Lynch M. Evolution and selection of quantitative traits. Oxford: Oxford University Press; 2018.
Kam-Thong, et al.EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units. Eur J Hum Genet. 2011; 19(4):465-71.
Wang X, Elston RC, Zhu X. The Meaning of Interaction. Hum Hered. 2011; 70(4):269-77.
Sham PC, Cherry SS. Chapter 1: Genetic Architecture of Complex Diseases In: Zeggini E, Morris A, editors. Analysis of Complex Disease Association Studies-A Practical Guide. London: AP Elsevier: 2011.
Evans DM. Chapter 12: Gene-Gene Interaction and Epistasis In: Zeggini E, Morris A, editors. Analysis of Complex Disease Association Studies-A Practical Guide. London: AP Elsevier: 2011.
Moore JH, Williams SM, Ritchie MD. Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. BioEssays. 2005; 27:637-46.
Van Steen K. Travelling the world of gene-gene interactions. Brief Bioinforma. 2012; 13(1):1-19.
Bateson W. Mendel's Principles of Heredity. Cambridge: Cambridge University Press; 1909.
Fisher RA. The Correlation between Relatives on the Supposition of Mendelian Inheritance. Trans R Soc Edimb. 1918; 52:399-433.
Cordell HJ. Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet. 2009; 10:392-404.
Moore JH. A global view of epistasis. Nat Genet. 2005; 37(1):13-4.
Gusareva ES, Van Steen K. Practical aspects of genome-wide association interaction analysis. Hum Genet. 2014; 133(11):1343-58.
Ritchie MD, Van Steen K. The search for gene-gene interactions in genome-wide association studies: challenges in abundance of methods, practical considerations, and biological interpretation. Ann Transl Med. 2018; 6(8):157.
Moore JH, Shestov M, Schmitt P, Olson RS. A heuristic method for simulating open-data of arbitrary complexity that can be used to compare and evaluate machine learning methods. Pac Symp Biocomput. 2018; 23:259-67.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC. PLINK: A tool Set for Whole-Genome Association and Population-Based Linkage Analysis. Am J Hum Genet. 2007; 81:559-75.
PLINK 1.9. www.cog-genomics.org/plink/1.9/. Accessed 22 May 2019.
Calus MPL, Vandenplas J. SNPrune: an efficient algorithm to prune large SNP array and sequence datasets based on high linkage disequilibrium. Genet Sel Evol. 2018; 50(34):1-15.
Van Lishout F, Gadaleta F, Moore JH, Wehenkel L, Van Steen K. gammaMAXT: a fast multiple testing correction algorithm. BioData Min. 2015; 8(36):1-15.
International Hapmap Consortium. A haplotype map of the human genome. Nature. 2005; 437:1299-320.
International Hapmap Consortium. A second generation of human haplotype map of over 3.1 million snps. Nature. 2007; 449:851-61.
International Hapmap Consortium. Integrating common and rare genetic variation in diverse human populations. Nature. 2010; 467:52-8.
1000 Genomes Project Consortium. An integrated map of genetic variation from 1092 human genomes. Nature. 2012; 491:56-65.
Peng B, Kimmel M. simuPOP: a forward-time population genetics simulation environment. Bioinformatics. 2005; 21(18):3686-7.
Peng B, Kimmel M, Amos CI. Forward-time population genetics simulations-Methods, implementation, and applications. Hoboken: Wiley-Blackwell; 2012.
Evans, et al.Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility. Nat Genet. 2011; 43(8):761-7.
Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005; 21(2):263-5.
MB-MDR 4.4.1 or 4.4.2. http://bio3.giga.ulg.ac.be/index.php/software/MB-MDR. Accessed 22 May 2019.
Cortes A, et al.Major histocompatibility complex associations of ankylosing spondylitis are complex and involve further epistasis with ERAP1. Nat Commun. 2015; 6:7146. https://doi.org/10.1038/ncomms8146.
Cattaert T, Luz Calle M, Dudek SM, Mahachie John JM, Van Lishout F, Urrea V, Ritchie MD, Van Steen K. Ann Hum Genet. 2011; 75(1):78-89.
Van Lishout F. An efficient and flexible software tool for genome wide association interactions studies. PhD thesis: Liège University, Applied Sciences Faculty, Engineering Department; 2016. http://hdl.handle.net/2268/197972. Accessed 22 May 2019.
Grady BJ, Torstenson ES, Ritchie MD. The effects of linkage disequilibrium in large scale datasets for MDR. BioData Min. 2011; 4(1):1-13.
Westfall P, Young S. Resampling-based Multiple Testing: Examples and Methods for P-value Adjustment. New York: John Wiley & Sons; 1993.
Su Z, Marchini J, Donelly P. HAPGEN2: simulation of multiple disease SNPs. Bioinformatics. 2011; 27(16):2304-5.
Shang J, Zhang J, Lei X, Zhao W, Dong Y. EpiSIM: simulation of multiple epistasis, linkage disequilibrium patterns and haplotype blocks for genome-wide interaction analysis. Genes Genom. 2013; 35:305-16.
Devlin B, Roeder K. Genomic Control for Association Studies. Biometrics. 1999; 55:997-1004.
Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, Sabatti C, Eskin E. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010; 42(4):348-54.
Yu JM, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF, et al.A unified mixed model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006; 38:203-8.
Polderman TJC, Benyamin B, de Leeuw CA, Sullivan PF, van Bochoven A, Visher M, Posthuma D. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat Genet. 2015; 47:702-9.
Niel C, Sinoquet C, Dina C, Rocheleau G. A survey about methods dedicated to epistasis detection. Front Genet. 2015; 6:285.
Calle ML, Urrea V, Mallats N, Van Steen K. MB-MDR: model-based multifactor dimensionality reduction for detecting interactions in high-dimensional genomic data. Spain: Department of Systems Biology, Universitat de Vic, Vic; 2008.
Calle ML, Urrea V, Vellalta G, Malats N, Van Steen K. Improving strategies for detecting genetic patterns of disease susceptibility in association studies. Stat Med. 2008; 27:6532-46.
Ritchie MD, Hahn LW, Roodi N, Bailey R, Dupont WD, Parl FF, Moore JH. Multifactor dimensionality reduction reveals high order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet. 2001; 69(1):138-47.
Gola D, Mahachie John JM, Van Steen K, Konig I. A roadmap to multifactor dimensionality reduction methods. Brief Bioinform. 2016; 17(2):293-308.
Mahachie John JM, Cattaert T, Van Lishout F, Gusareva ES, Van Steen K. Lower-Order Effects Adjustment in Quantitative Traits Model-Based Multifactor Dimensionality Reduction. PLoS ONE. 2012;7(1).
Bessonov K, Gusareva ES, Van Steen K. A cautionary note on the impact of protocol changes for genome-wide association SNP x SNP interaction studies: an example on ankylosing spondylitis. Hum Gent. 2015; 134:761-73.
Wan X, Yang C, Yang Q, Hue H, Fan X, Tang NL, Yu W. BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet. 2010; 87(3):325-40.
Bush WS, Dudek SM, Ritchie MD. Biofilter: A Knowledge-Integration System for the Multi-Locus Analysis of Genome-Wide Association Studies. Pac Symp Biocomput. 2009;:368-79.
Bozeman M. Golden Helix, Inc.SNP and Variation Suite, Version 7.x (software). 2015. SNP and variation Suite. http://goldenhelix.com/products/SNP-Variation/. Accessed 22 May 2019.
Sicotte H, et al.SNPPicker: High quality tag SNP selection across multiple populations. BMC Bioinformatics. 2011; 12:129. https://doi.org/10.1186/1471-2105-12-129.
Hudson RR, Kaplan NL. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics. 1985; 111(1):147-64.
Gabriel SB, Schaffner SF, et al.The structure of haplotype blocks in the human genome. Science. 2002; 296:2225-9.
Li N, Stephens M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics. 2004; 167(2):2213-33.
Berisa T, Pickrell JK. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics. 2016; 32(2):283-5.
Pe'er I, et al.Biases and reconciliation in estimates of linkage disequilibrium in the human genome. Am J Human Genet. 2006; 78(4):588-603.
Gazal S, et al.Linkage disequilibrium dependent architecture of human complex traits shows action of negative selection. Nat Genet. 2017; 49(10):1421-7.
Cattaert T, Urrea V, Naj AC, De Lobel L, De Wit V, et al.FAM-MDR: A Flexible Family-Based Multifactor Dimensionality Reduction Technique to Detect Epistasis Using Related Individuals. PLoS ONE. 2010; 5(4):e10304. https://doi.org/10.1371/journal.pone.0010304.
Mangin B, et al.Novel measures of linkage disequilibrium that correct the bias due to population structure and relatedness. Heredity. 2012; 108(3):285-91.
Ma L, Clark AG, Keinan A. Gene-Based Testing of Interactions in Association Studies of Quantitative Traits. PLoS Genet. 2013; 9(2):e1003321. https://doi.org/10.1371/journal.pgen.1003321.