Article (Scientific journals)
Using supervised learning methods for gene selection in RNA-Seq case-control studies
Wenric, Stéphane; Shemirani, R.
2018In Frontiers in Genetics, 9 (AUG)
Peer Reviewed verified by ORBi
 

Files


Full Text
fgene-09-00297.pdf
Publisher postprint (828.04 kB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Feature selection; Gene expression; Gene selection; Random forests; RNA-Seq; Supervised learning; Transcriptomics; Variational autoencoders
Abstract :
[en] Whole transcriptome studies typically yield large amounts of data, with expression values for all genes or transcripts of the genome. The search for genes of interest in a particular study setting can thus be a daunting task, usually relying on automated computational methods. Moreover, most biological questions imply that such a search should be performed in a multivariate setting, to take into account the inter-genes relationships. Differential expression analysis commonly yields large lists of genes deemed significant, even after adjustment for multiple testing, making the subsequent study possibilities extensive. Here, we explore the use of supervised learning methods to rank large ensembles of genes defined by their expression values measured with RNA-Seq in a typical 2 classes sample set. First, we use one of the variable importance measures generated by the random forests classification algorithm as a metric to rank genes. Second, we define the EPS (extreme pseudo-samples) pipeline, making use of VAEs (Variational Autoencoders) and regressors to extract a ranking of genes while leveraging the feature space of both virtual and comparable samples. We show that, on 12 cancer RNA-Seq data sets ranging from 323 to 1,210 samples, using either a random forests-based gene selection method or the EPS pipeline outperforms differential expression analysis for 9 and 8 out of the 12 datasets respectively, in terms of identifying subsets of genes associated with survival. These results demonstrate the potential of supervised learning-based gene selection methods in RNA-Seq studies and highlight the need to use such multivariate gene selection methods alongside the widely used differential expression analysis. © 2018 Wenric and Shemirani.
Disciplines :
Computer science
Author, co-author :
Wenric, Stéphane ;  Université de Liège - ULiège
Shemirani, R.;  Department of Computer Science, Information Sciences Institute, University of Southern California, Marina del Rey, CA, United States
Language :
English
Title :
Using supervised learning methods for gene selection in RNA-Seq case-control studies
Publication date :
2018
Journal title :
Frontiers in Genetics
eISSN :
1664-8021
Publisher :
Frontiers Media S.A.
Volume :
9
Issue :
AUG
Peer reviewed :
Peer Reviewed verified by ORBi
Available on ORBi :
since 26 May 2021

Statistics


Number of views
49 (2 by ULiège)
Number of downloads
36 (2 by ULiège)

Scopus citations®
 
39
Scopus citations®
without self-citations
38
OpenCitations
 
26

Bibliography


Similar publications



Contact ORBi