Detection of single nucleotide polymorphisms in virus genomes assembled from high-throughput sequencing data: large-scale performance testing of sequence analysis strategies

Rollin, Johan; Brostaux, Yves; Massart, Sébastien

doi:10.7717/peerj.15816

Article (Scientific journals)

Detection of single nucleotide polymorphisms in virus genomes assembled from high-throughput sequencing data: large-scale performance testing of sequence analysis strategies

Rollin, Johan; Brostaux, Yves; Massart, Sébastien

2023 • In PeerJ

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/2268/303637

DOI
10.7717/peerj.15816

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

PT_reviewPeerJ_clean-revision2.docx

Author preprint (223.13 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

bioinformatic; virus; variant; SNPs

Abstract :

[en] Recent developments in high-throughput sequencing (HTS) technologies and bioinformatics have drastically changed research in virology, especially for virus discovery. Indeed, proper monitoring of the viral population requires information on the different isolates circulating in the studied area. For this purpose, HTS has greatly facilitated the sequencing of new genomes of detected viruses and their comparison. However, bioinformatics analyses allowing reconstruction of genome sequences and detection of Single Nucleotide Polymorphisms (SNPs) can potentially create bias and has not been widely addressed so far. Therefore, more knowledge is required on the limitations of predicting SNPs based on HTS-generated sequence samples. To address this issue, we compared the ability of 14 plant virology laboratories, each employing a different bioinformatics pipeline, to detect 21 variants of pepino mosaic virus (PepMV) in three samples through large-scale Performance-Testing (PT) using three artificially designed datasets. To evaluate the impact of bioinformatics analyses, they were divided into three key steps: reads pre-processing, virus-isolate identification, and variant calling. Each step was evaluated independently through an original, PT design including discussion and validation between participants at each step. Overall, this work underlines key parameters influencing SNPs detection and proposes recommendations for reliable variant calling for plant viruses. The identification of the closest reference, mapping parameters and manual validation of the detection were recognized as the most impactful analysis steps for the success of the SNPs detections. Strategies to improve the prediction of SNPs are also discussed.

Disciplines :

Genetics & genetic processes
Agriculture & agronomy

Author, co-author :

Rollin, Johan ; Université de Liège - ULiège > TERRA Research Centre

Brostaux, Yves ; Université de Liège - ULiège > TERRA Research Centre > Modélisation et développement

Massart, Sébastien ; Université de Liège - ULiège > TERRA Research Centre > Gestion durable des bio-agresseurs

Language :

English

Title :

Detection of single nucleotide polymorphisms in virus genomes assembled from high-throughput sequencing data: large-scale performance testing of sequence analysis strategies

Publication date :

2023

Journal title :

PeerJ

eISSN :

2167-8359

Publisher :

PeerJ, United States - California

Peer reviewed :

Peer Reviewed verified by ORBi

Available on ORBi :

since 07 June 2023

Statistics

Number of views

213 (7 by ULiège)

Number of downloads

33 (0 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenAlex citations

Bibliography

Barbitoff YA, Abasov R, Tvorogova VE, Glotov AS, Predeus AV. 2022. Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery. BMC Genomics 23(1):1–17 DOI 10.1186/s12864-022-08365-3.
Bomba L, Walter K, Soranzo N. 2017. The impact of rare and low-frequency genetic variants in common disease. Genome Biology 18(1):1–17 DOI 10.1186/s13059-017-1212-4.
Bordería AV, Isakov O, Moratorio G, Henningsson R, Agüera-González S, Organtini L, Gnädig NF, Blanc H, Alcover A, Hafenstein S, Fontes M, Shomron N, Vignuzzi M. 2015. Group selection and contribution of minority variants during virus adaptation determines virus fitness and phenotype. PLOS Pathogens 11(5):e1004838 DOI 10.1371/journal.ppat.1004838.
Černi S, Ruščić J, Nolasco G, Gatin Z, Krajačić M, Škorić D. 2008. Stem pitting and seedling yellows symptoms of Citrus tristeza virus infection may be determined by minor sequence variants. Virus Genes 36(1):241–249 DOI 10.1007/s11262-007-0183-z.
Clevenger J, Chavarro C, Pearl SA, Ozias-Akins P, Jackson SA. 2015. Single nucleotide polymorphism identification in polyploids: a review, example, and recommendations. Molecular Plant 8(6):831–846 DOI 10.1016/j.molp.2015.02.002.
Deng ZL, Dhingra A, Fritz A, Götting J, Münch PC, Steinbrück L, Schulz TF, Ganzenmüller T, McHardy AC. 2021. Evaluating assembly and variant calling software for strain-resolved analysis of large DNA viruses. Briefings in Bioinformatics 22(3):1–12 DOI 10.1093/bib/bbaa123.
Domingo E, Perales C. 2019. Viral quasispecies. PLOS Genetics 15(10):e1008271 DOI 10.1371/journal.pgen.1008271.
Elena SF, Fraile A, García-Arenal F. 2014. Evolution and emergence of plant viruses. Virus Structure and Assembly 88:161–191 DOI 10.1016/B978-0-12-800098-4.00003-9.
Gaafar YZA, Westenberg M, Botermans M, László K, De Jonghe K, Foucart Y, Ferretti L, Kutnjak D, Pecman A, Mehle N, Kreuze J, Muller G, Vakirlis N, Beris D, Varveri C, Ziebell H. 2021. Interlaboratory comparison study on ribodepleted total RNA high-throughput sequencing for plant virus diagnostics and bioinformatic competence. Pathogens 10(9):1174 DOI 10.3390/pathogens10091174.
Gibbs MJ, Weiller GF. 1999. Evidence that a plant virus switched hosts to infect a vertebrate and then recombined with a vertebrate-infecting virus. Proceedings of the National Academy of Sciences of the United States of America 96(14):8022 DOI 10.1073/PNAS.96.14.8022.
Guirao-Rico S, González J. 2021. Benchmarking the performance of Pool-seq SNP callers using simulated and real sequencing data. Molecular Ecology Resources 21(4):1216 DOI 10.1111/1755-0998.13343.
Hirabara SM, Serdan TDA, Gorjao R, Masi LN, Pithon-Curi TC, Covas DT, Curi R, Durigon EL. 2022. SARS-COV-2 variants: differences and potential of immune evasion. Frontiers in Cellular and Infection Microbiology 11:1401 DOI 10.3389/fcimb.2021.781429.
Huang W, Li L, Myers JR, Marth GT. 2012. ART: a next-generation sequencing read simulator. Bioinformatics 28(4):593 DOI 10.1093/bioinformatics/btr708.
Koboldt DC. 2020. Best practices for variant calling in clinical sequencing. Genome Medicine 12(1):1–13 DOI 10.1186/s13073-020-00791-w.
Krishnamurthy SR, Wang D. 2017. Origins and challenges of viral dark matter. Virus Research 239(3):136–142 DOI 10.1016/j.virusres.2017.02.002.
Kutnjak D, Elena SF, Ravnikar M. 2017. Time-sampled population sequencing reveals the interplay of selection and genetic drift in experimental evolution of potato virus Y. Journal of Virology 91(16):e00690-17 DOI 10.1128/JVI.00690-17.
Kutnjak D, Rupar M, Gutierrez-Aguirre I, Curk T, Kreuze JF, Ravnikar M. 2015. Deep sequencing of virus-derived small interfering RNAs and RNA from viral particles shows highly similar mutational landscapes of a plant virus population. Journal of Virology 89(9):4760–4769 DOI 10.1128/JVI.03685-14.
Lebas B, Adams I, Al Rwahnih M, Baeyen S, Bilodeau GJ, Blouin AG, Boonham N, Candresse T, Chandelier A, De Jonghe K, Fox A, Gaafar YZA, Gentit P, Haegeman A, Ho W, Hurtado-Gonzales O, Jonkers W, Kreuze J, Kutjnak D, Landa B, Liu M, Maclot F, Malapi-Wight M, Maree HJ, Martoni F, Mehle N, Minafra A, Mollov D, Moreira A, Nakhla M, Petter F, Piper AM, Ponchart J, Rae R, Remenant B, Rivera Y, Rodoni B, Roenhorst JW, Rollin J, Saldarelli P, Santala J, Souza-Richards R, Spadaro D, Studholme DJ, Sultmanis S, van der Vlugt R, Tamisier L, Trontin C, Vazquez-Iglesias I, Vicente CSL, Vossenberg BTLH, Wetzel T, Ziebell H, Massart S. 2022. Facilitating the adoption of high-throughput sequencing technologies as a plant pest diagnostic test in laboratories: a step-by-step description. EPPO Bulletin 52(2):394–418 DOI 10.1111/epp.12863.
Massart S, Chiumenti M, Jonghe KDe, Glover R, Haegeman A, Koloniuk I, Komínek P, Kreuze J, Kutnjak D, Lotos L, Maclot F, Maliogka V, Maree HJ, Olivier T, Olmos A, Pooggin MM, Reynard JS, Ruiz-García AB, Safarova D, Schneeberger PHH, Sela N, Turco S, Vainio EJ, Varallyay E, Verdin E, Westenberg M, Brostaux Y, Candresse T. 2019. Virus detection by high-throughput sequencing of small RNAs: large-scale performance testing of sequence analysis strategies. Phytopathology 109(3):488–497 DOI 10.1094/PHYTO-02-18-0067-R.
Nguyen NTT, Contreras-Moreira B, Castro-Mondragon JA, Santana-Garcia W, Ossio R, Robles-Espinoza CD, Bahin M, Collombet S, Vincens P, Thieffry D, van Helden J, Medina-Rivera A, Thomas-Chollier M. 2018. RSAT 2018: regulatory sequence analysis tools 20th anniversary. Nucleic Acids Research 46(W1):W209–W214 DOI 10.1093/nar/gky317.
Nyirakanani C, Tamisier L, Bizimana JP, Rollin J, Nduwumuremyi A, de Paul Bigirimana V, Selmi I, Lasois L, Vanderschuren H, Massart S. in press. Going beyond consensus genome sequences: an innovative SNP-based methodology reconstructs different Uganda cassava brown streak virus haplotypes geographically clustered at the country-wide level. Virus Evolution.
Pappas N, Roux S, Hölzer M, Lamkiewicz K, Mock F, Marz M. 2021. Virus bioinformatics. Encyclopedia of Virology 27:124–132 DOI 10.1016/B978-0-12-814515-9.00034-5.
Ramesh S, Govindarajulu M, Parise RS, Neel L, Shankar T, Patel S, Lowery P, Smith F, Dhanasekaran M, Moore T. 2021. Emerging SARS-CoV-2 variants: a review of its mutations, its implications and vaccine efficacy. Vaccines 9(10):1195 DOI 10.3390/VACCINES9101195.
Rubio L, Galipienso L, Ferriol I. 2020. Detection of plant viruses and disease management: relevance of genetic diversity and evolution. Frontiers in Plant Science 11:1092 DOI 10.3389/FPLS.2020.01092/BIBTEX.
Simon-Loriere E, Holmes EC. 2011. Why do RNA viruses recombine? Nature Reviews Microbiology 9(8):617 DOI 10.1038/nrmicro2614.
Tamisier L, Haegeman A, Foucart Y, Fouillien N, Al Rwahnih M, Buzkan N, Candresse T, Chiumenti M, De Jonghe K, Lefebvre, Massart S. 2021. Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection. Peer Community Journal 1(3):533 DOI 10.24072/pcjournal.62.
Tromas N, Zwart MP, Maïté P, Elena SF. 2014. Estimation of the in vivo recombination rate for a plant RNA virus. Journal of General Virology 95(Pt 3):724–732 DOI 10.1099/vir.0.060822-0.
Zheng Y, Gao S, Padmanabhan C, Li R, Galvez M, Gutierrez D, Fuentes S, Ling KS, Kreuze J, Fei Z. 2017. VirusDetect: an automated pipeline for efficient virus discovery using deep sequencing of small RNAs. Virology 500:130–138 DOI 10.1016/j.virol.2016.10.017.