[en] The whole transcriptome contains information about nonsense, missense, silent, in-frame and frameshift mutations, as observed at whole-exome level, as well as splicing and (allelic) gene-expression changes which are missed by DNA analysis. One important step in the analysis of gene expression data arising from RNA-seq is the detection of differential expression (DE) levels. Several methods are available and the choice is sometimes controversial. For a reliable DE analysis that reduces False Positive DE genes, and accurate estimation of gene expression levels, a good and suitable normalization approach (including correction for confounders) is mandatory. Several normalization methods have been proposed to correct for both within-sample and between-sample biases. RUV (Removing Unwanted Variation) is one of them and has the advantage to correct for batch effects including potentially unknown unwanted variation in gene expression. In this study, we present a comparison on real-life Illumina paired-end sequencing data for Estrogen-Receptor-Positive (ER+) Breast Cancer tissues versus matched controls between RUV (RUVg using in silico negative control genes) and more commonly used methods for RNA-seq data normalization, such as DESeq2, edgeR, and UQ. The set of in silico empirical negative control genes for RUVg was defined as the set of least significant DE genes obtained after a first DE analysis performed prior to RUVg correction. Box plots of relative log expression (RLE) among the samples and PCA plots show that RUVg performs well and leads to a stabilization of read count across samples with a clear clustering of biological replicates.
Research Center/Unit :
GIGA‐R - Giga‐Research - ULiège
Disciplines :
Engineering, computing & technology: Multidisciplinary, general & others
Author, co-author :
Debit, Ahmed ; Université de Liège - ULiège > Cancer-Human Genetics