Article (Scientific journals)
Exploiting SNP Correlations within Random Forest for Genome-Wide Association Studies
Botta, Vincent; Louppe, Gilles; Geurts, Pierre et al.
2014In PLoS ONE
Peer Reviewed verified by ORBi
 

Files


Full Text
journal.pone.0093379.pdf
Publisher postprint (646.07 kB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
machine learning; data mining; random forest; snp; Genome-wide association studies; genetics; linkage disequilibrium; correlation; decision trees
Abstract :
[en] The primary goal of genome-wide association studies (GWAS) is to discover variants that could lead, in isolation or in combination, to a particular trait or disease. Standard approaches to GWAS, however, are usually based on univariate hypothesis tests and therefore can account neither for correlations due to linkage disequilibrium nor for combinations of several markers. To discover and leverage such potential multivariate interactions, we propose in this work an extension of the Random Forest algorithm tailored for structured GWAS data. In terms of risk prediction, we show empirically on several GWAS datasets that the proposed T-Trees method significantly outperforms both the original Random Forest algorithm and standard linear models, thereby suggesting the actual existence of multivariate non-linear effects due to the combinations of several SNPs. We also demonstrate that variable importances as derived from our method can help identify relevant loci. Finally, we highlight the strong impact that quality control procedures may have, both in terms of predictive power and loci identification.
Disciplines :
Engineering, computing & technology: Multidisciplinary, general & others
Author, co-author :
Botta, Vincent ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Louppe, Gilles  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Geurts, Pierre  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique
Wehenkel, Louis  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Language :
English
Title :
Exploiting SNP Correlations within Random Forest for Genome-Wide Association Studies
Publication date :
02 April 2014
Journal title :
PLoS ONE
eISSN :
1932-6203
Publisher :
Public Library of Science, San Franscisco, United States - California
Peer reviewed :
Peer Reviewed verified by ORBi
Available on ORBi :
since 23 April 2014

Statistics


Number of views
273 (25 by ULiège)
Number of downloads
239 (8 by ULiège)

Scopus citations®
 
60
Scopus citations®
without self-citations
60
OpenCitations
 
60
OpenAlex citations
 
85

Bibliography


Similar publications



Contact ORBi