[en] Bioinformatic pipelines are becoming increasingly complex with the ever-accumulating amount of Next-generation sequencing (NGS) data. Their orchestration is difficult with a simple Bash script, but bioinformatics workflow managers such as Nextflow provide a framework to overcome respective problems. This study used Nextflow to develop a bioinformatic pipeline for detecting expression quantitative trait loci (eQTL) using a DSL2 Nextflow modular syntax, to enable sharing the huge demand for computing power as well as data access limitation across different partners often associated with eQTL studies. Based on the results from a test run with pilot data by measuring the required runtime and computational resources, the new pipeline should be suitable for eQTL studies in large scale analyses.
Disciplines :
Genetics & genetic processes
Author, co-author :
Chitneedi, Praveen Krishna ; Research Institute for Farm Animal Biology (FBN), Dummerstorf, Germany ; Research Institute for Farm Animal Biology (FBN), Dummerstorf, Germany
Hadlich, Frieder; Research Institute for Farm Animal Biology (FBN), Dummerstorf, Germany
Moreira, Gabriel C. M.; Unit of Animal Genomics, GIGA Institute, University of Liège, Liège, Belgium
Espinosa-Carrasco, Jose ; Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
Li, Changxi; Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Canada ; Lacombe Research and Development Centre, Agriculture and Agri-Food Canada, Lacombe, Canada
Plastow, Graham ; Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Canada
Fischer, Daniel; Natural Resources Institute Finland (Luke), Green Technology, Animal and Plant Genomics and Breeding, Jokioinen, Finland
Charlier, Carole ; Université de Liège - ULiège > GIGA > GIGA Medical Genomics
Rocha, Dominique; Universite Paris-Saclay, INRAE, AgroParisTech, GABI, Jouy-en-Josas, France
Chamberlain, Amanda J.; Agriculture Victoria Research, AgriBio, Centre for AgriBiosciences, Bundoora, Australia ; School of Applied Systems Biology, La Trobe University, Bundoora, Australia
Kuehn, Christa; Research Institute for Farm Animal Biology (FBN), Dummerstorf, Germany ; Faculty of Agricultural and Environmental Science, University Rostock, Rostock, Germany ; Friedrich-Loeffler-Institut (FLI), Federal Research Institute for Animal Health, Greifswald, Germany
Language :
English
Title :
EQTL-Detect: Nextflow-based pipeline for eQTL detection in modular format with sharable and parallelizable scripts
Leipzig,J. (2017) A review of bioinformatic pipeline frameworks. Brief. Bioinform., 18, 530–536.
Garijo,D., Kinnings,S., Xie,L., Xie,L., Zhang,Y., Bourne,P.E. and Gil,Y. (2013) Quantifying reproducibility in computational biology: the case of the tuberculosis drugome. PLoS One, 8, e80278.
Crusoe,M.R., Abeln,S., Iosup,A., Amstutz,P., Chilton,J., Tijanić,N., Ménager,H., Soiland-Reyes,S., Gavrilović,B. and Goble,C. (2022) Methods included: standardizing computational reuse and portability with the common workflow language. CWL Commun. Commun. ACM, 65, 54–63.
Di Tommaso,P., Chatzou,M., Floden,E.W., Barja,P.P., Palumbo,E. and Notredame,C. (2017) Nextflow enables reproducible computational workflows. Nat. Biotechnol., 35, 316–319.
Wratten,L., Wilm,A. and Goke,J. (2021) Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nat. Methods, 18, 1161–1168.
Majewski,J. and Pastinen,T. (2011) The study of eQTL variations by RNA-seq: from SNPs to phenotypes. Trends Genet., 27, 72–79.
Gilad,Y., Rifkin,S.A. and Pritchard,J.K. (2008) Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet., 24, 408–415.
de Sena Brandine,G. and Smith,A.D. (2019) Falco: high-speed FastQC emulation for quality control of sequencing data. F1000Res, 8, 1874.
Bolger,A.M., Lohse,M. and Usadel,B. (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30, 2114–2120.
Li,H., Handsaker,B., Wysoker,A., Fennell,T., Ruan,J., Homer,N., Marth,G., Abecasis,G., Durbin,R. and Genome Project Data Processing, SGenome Project Data Processing, S. (2009) The sequence alignment/map format and SAMtools. Bioinformatics, 25, 2078–2079.
Chang,C.C., Chow,C.C., Tellier,L.C., Vattikuti,S., Purcell,S.M. and Lee,J.J. (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience, 4, 7.
Danecek,P., Auton,A., Abecasis,G., Albers,C.A., Banks,E., DePristo,M.A., Handsaker,R.E., Lunter,G., Marth,G.T., Sherry,S.T., et al. (2011) The variant call format and VCFtools. Bioinformatics, 27, 2156–2158.
Li,Y.I., Knowles,D.A., Humphrey,J., Barbeira,A.N., Dickinson,S.P., Im,H.K. and Pritchard,J.K. (2018) Annotation-free quantification of RNA splicing using LeafCutter. Nat. Genet., 50, 151–158.
Pertea,M., Pertea,G.M., Antonescu,C.M., Chang,T.C., Mendell,J.T. and Salzberg,S.L. (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol., 33, 290–295.
Cotto,K.C., Feng,Y.Y., Ramu,A., Richters,M., Freshour,S.L., Skidmore,Z.L., Xia,H., McMichael,J.F., Kunisaki,J., Campbell,K.M., et al. (2023) Integrated analysis of genomic and transcriptomic data for the discovery of splice-associated variants in cancer. Nat. Commun., 14, 1589.
Zheng,X., Levine,D., Shen,J., Gogarten,S.M., Laurie,C. and Weir,B.S. (2012) A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics, 28, 3326–3328.
Delaneau,O., Ongen,H., Brown,A.A., Fort,A., Panousis,N.I. and Dermitzakis,E.T. (2017) A complete tool set for molecular QTL discovery and analysis. Nat. Commun., 8, 15452.
The GTEx Consortium (2020) The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science, 369, 1318–1330.
Li,L., Huang,K.L., Gao,Y., Cui,Y., Wang,G., Elrod,N.D., Li,Y., Chen,Y.E., Ji,P., Peng,F., et al. (2021) An atlas of alternative polyadenylation quantitative trait loci contributing to complex trait and disease heritability. Nat. Genet., 53, 994–1005.
Gong,J., Mei,S., Liu,C., Xiang,Y., Ye,Y., Zhang,Z., Feng,J., Liu,R., Diao,L., Guo,A.Y., et al. (2018) PancanQTL: systematic identification of cis-eQTLs and trans-eQTLs in 33 cancer types. Nucleic Acids Res., 46, D971–D976.
Liu,S., Gao,Y., Canela-Xandri,O., Wang,S., Yu,Y., Cai,W., Li,B., Xiang,R., Chamberlain,A.J., Pairo-Castineira,E., et al. (2022) A multi-tissue atlas of regulatory variants in cattle. Nat. Genet., 54, 1438–1447.
Zhou,H.J., Li,L., Li,Y., Li,W. and Li,J.J. (2022) PCA outperforms popular hidden variable inference methods for molecular QTL mapping. Genome Biol., 23, 210.
Hayes,B.J. and Daetwyler,H.D. (2019) 1000 Bull genomes project to map simple and complex genetic traits in cattle: applications and outcomes. Annu. Rev. Anim. Biosci., 7, 89–102.
Salavati,M., Clark,R., Becker,D., Kuhn,C., Plastow,G., Dupont,S., Moreira,G.C.M., Charlier,C. and Clark,E.L. (2023) Improving the annotation of the cattle genome by annotating transcription start sites in a diverse set of tissues and populations using Cap Analysis Gene Expression sequencing. G3, 13, 108.
Heimes,A., Brodhagen,J., Weikard,R., Seyfert,H.M., Becker,D., Meyerholz,M.M., Petzl,W., Zerbe,H., Hoedemaker,M., Rohmeier,L., et al. (2020) Hepatic transcriptome analysis identifies divergent pathogen-specific targeting-strategies to modulate the innate immune system in response to intramammary infection. Front. Immunol., 11, 715.
Nolte,W., Weikard,R., Brunner,R.M., Albrecht,E., Hammon,H.M., Reverter,A. and Kuhn,C. (2019) Biological network approach for the identification of regulatory long non-coding RNAs associated with metabolic efficiency in cattle. Front. Genet., 10, 1130.
Pausch,H., Emmerling,R., Schwarzenbacher,H. and Fries,R. (2016) A multi-trait meta-analysis with imputed sequence variants reveals twelve QTL for mammary gland morphology in Fleckvieh cattle. Genet. Sel. Evol., 48, 14.
Kerimov,N., Hayhurst,J.D., Peikova,K., Manning,J.R., Walter,P., Kolberg,L., Samovica,M., Sakthivel,M.P., Kuzmin,I., Trevanion,S.J., et al. (2021) A compendium of uniformly processed human gene expression and splicing quantitative trait loci. Nat. Genet., 53, 1290–1299.
Kel,I., Chang,Z., Galluccio,N., Romeo,M., Beretta,S., Diomede,L., Mezzelani,A., Milanesi,L., Dieterich,C. and Merelli,I. (2016) SPIRE, a modular pipeline for eQTL analysis of RNA-seq data, reveals a regulatory hotspot controlling miRNA expression in C. elegans. Mol. Biosyst., 12, 3447–3458.
Wang,T., Liu,Y., Ruan,J., Dong,X., Wang,Y. and Peng,J. (2021) A pipeline for RNA-seq based eQTL analysis with automated quality control procedures. BMC Bioinf., 22, 403.
Shabalin,A.A. (2012) Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics, 28, 1353–1358.
Wang,T., Liu,Y., Yin,Q., Geng,J., Chen,J., Yin,X., Wang,Y., Shang,X., Tian,C., Wang,Y., et al. (2022) Enhancing discoveries of molecular QTL studies with small sample size using summary statistic imputation. Brief. Bioinform., 23, 370.
Wang,T., Peng,Q., Liu,B., Liu,X., Liu,Y., Peng,J. and Wang,Y. (2019) eQTLMAPT: fast and accurate eQTL mediation analysis with efficient permutation testing approaches. Front. Genet., 10, 1309.
Wang,T., Zhao,H., Xiao,Y., Yang,H., Yin,X., Wang,Y., Xiao,B., Shang,X. and Peng,J. (2022) Discovering eQTL regulatory patterns through eQTLMotif. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM). pp. 130–135.
Wang,T., Yan,Z., Zhang,Y., Lou,Z., Zheng,X., Mai,D., Wang,Y., Shang,X., Xiao,B., Peng,J., et al. (2024) postGWAS: a web server for deciphering the causality post the genome-wide association studies. Comput. Biol. Med., 171, 108108.