Bioinformatic; Contamination; Detection; Metagenomic; Sequencing; Virus; RNA, Viral; Humans; Reproducibility of Results; Metagenomics/methods; Computational Biology; High-Throughput Nucleotide Sequencing/methods; RNA, Viral/analysis; RNA, Viral/genetics; Viruses/genetics; High-Throughput Nucleotide Sequencing; Metagenomics; Viruses; Biotechnology; Structural Biology; Ecology, Evolution, Behavior and Systematics; Physiology; Biochemistry, Genetics and Molecular Biology (all); Agricultural and Biological Sciences (all); Plant Science; Developmental Biology; Cell Biology; General Agricultural and Biological Sciences; General Biochemistry, Genetics and Molecular Biology
Abstract :
[en] [en] BACKGROUND: High-throughput sequencing (HTS) technologies completed by the bioinformatic analysis of the generated data are becoming an important detection technique for virus diagnostics. They have the potential to replace or complement the current PCR-based methods thanks to their improved inclusivity and analytical sensitivity, as well as their overall good repeatability and reproducibility. Cross-contamination is a well-known phenomenon in molecular diagnostics and corresponds to the exchange of genetic material between samples. Cross-contamination management was a key drawback during the development of PCR-based detection and is now adequately monitored in routine diagnostics. HTS technologies are facing similar difficulties due to their very high analytical sensitivity. As a single viral read could be detected in millions of sequencing reads, it is mandatory to fix a detection threshold that will be informed by estimated cross-contamination. Cross-contamination monitoring should therefore be a priority when detecting viruses by HTS technologies.
RESULTS: We present Cont-ID, a bioinformatic tool designed to check for cross-contamination by analysing the relative abundance of virus sequencing reads identified in sequence metagenomic datasets and their duplication between samples. It can be applied when the samples in a sequencing batch have been processed in parallel in the laboratory and with at least one specific external control called Alien control. Using 273 real datasets, including 68 virus species from different hosts (fruit tree, plant, human) and several library preparation protocols (Ribodepleted total RNA, small RNA and double-stranded RNA), we demonstrated that Cont-ID classifies with high accuracy (91%) viral species detection into (true) infection or (cross) contamination. This classification raises confidence in the detection and facilitates the downstream interpretation and confirmation of the results by prioritising the virus detections that should be confirmed.
CONCLUSIONS: Cross-contamination between samples when detecting viruses using HTS (Illumina technology) can be monitored and highlighted by Cont-ID (provided an alien control is present). Cont-ID is based on a flexible methodology relying on the output of bioinformatics analyses of the sequencing reads and considering the contamination pattern specific to each batch of samples. The Cont-ID method is adaptable so that each laboratory can optimise it before its validation and routine use.
Disciplines :
Biochemistry, biophysics & molecular biology
Author, co-author :
Rollin, Johan; Plant Pathology Laboratory, Gembloux Agro-Bio Tech, University of Liège, 5030, Gembloux, Belgium ; DNAVision, 6041, Gosselies, Belgium
Rong, Wei; Plant Pathology Laboratory, Gembloux Agro-Bio Tech, University of Liège, 5030, Gembloux, Belgium
Massart, Sébastien ; Université de Liège - ULiège > TERRA Research Centre > Gestion durable des bio-agresseurs
Language :
English
Title :
Cont-ID: detection of sample cross-contamination in viral metagenomic data.
H2020 - 813542 - INEXTVIR - Innovative Network for Next Generation Training and Sequencing of Virome
Funders :
EU - European Union
Funding text :
We thank Angelo Locicero for technical support and Gladys Rufflard for administrative support. Delphine Masse (ANSES, La Réunion, France), Kathy Crew and John Thomas (Queensland Alliance for Agriculture and Food Innovation, Brisbane, Australia) and Mathieu Chabannes and Marilyne Caruana (CIRAD, Montpellier, France) are also acknowledged for kindly providing Musa reference samples. Special thanks to Marie-Emilie Gauthier and Roberto Barrero for providing dataset G and discussing cross-contamination in viral metagenomes with us.The work has been supported by (1) the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 813542 T (INEXTVIR) and (2) the NGS cross-centre project from the CGIAR Fund and in particular by the Germplasm Health Unit (GHU) of the CGIAR Genebank Platform.
Lebas B, Adams I, al Rwahnih M, Baeyen S, Bilodeau GJ, Blouin AG, et al. Facilitating the adoption of high‐throughput sequencing technologies as a plant pest diagnostic test in laboratories: A step‐by‐step description. EPPO Bull. 2022;52(2):394–418. Available from: https://onlinelibrary.wiley.com/doi/10.1111/epp.12863.
Massart S, Olmos A, Jijakli H, Candresse T. Current impact and future directions of high throughput sequencing in plant virus diagnostics. Virus Res. 2014;188:90–6. DOI: 10.1016/j.virusres.2014.03.029
Charlebois RL, Sathiamoorthy S, Logvinoff C, Gisonni-Lex L, Mallet L, Ng SHS. Sensitivity and breadth of detection of high-throughput sequencing for adventitious virus detection. NPJ Vaccines 2020. 2020;5(1):1–8. Available from: https://www.nature.com/articles/s41541-020-0207-4.
Soltani N, Stevens KA, Klaassen V, Hwang M-S, Golino DA, Al Rwahnih M. Quality assessment and validation of high-throughput sequencing for Grapevine virus diagnostics. Viruses. 2021;13:1130. 10.3390/v13061130.
Rong W, Rollin J, Hanafi M, Roux N, Massart S. Validation of high throughput sequencing as virus indexing test for Musa germplasm: performance criteria evaluation and contamination monitoring using an alien control. PhytoFrontiers. 2022. https://doi.org/10.1094/PHYTOFR-03-22-0030-FI.
Maree HJ, Fox A, Al Rwahnih M, Boonham N, Candresse T. Application of HTS for routine plant virus diagnostics: state of the art and challenges. Front Plant Sci. 2018;9:1082. https://doi.org/10.3389/fpls.2018.01082.
Ng SH, Braxton C, Eloit M, Feng SF, Fragnoud R, Mallet L, Mee ET, Sathiamoorthy S, Vandeputte O, Khan AS. Current perspectives on High-Throughput Sequencing (HTS) for adventitious virus detection: upstream sample processing and library preparation. Viruses. 2018;10:566. 10.3390/v10100566.
Kumar R, Nagpal S, Kaushik S, Mendiratta S. COVID-19 diagnostic approaches: different roads to the same destination. Virus Disease. 2020;31(2):97–105. [cited 2021 Oct 20]. Available from: https://link.springer.com/article/10.1007/s13337-020-00599-7.
Vereecke N, Carnet F, Pronost S, Vanschandevijl K, Theuns S, Nauwynck H. Genome sequences of equine herpesvirus 1 strains from a European outbreak of neurological disorders linked to a horse gathering in Valencia, Spain, in 2021. Microbiol Resourc Announc. 2021;10. American Society for Microbiology; [cited 2021 Oct 20].
Olmos A, Boonham N, Candresse T, Gentit P, Giovani B, Kutnjak D, et al. High-throughput sequencing technologies for plant pest diagnosis: challenges and opportunities. EPPO Bulletin. 2018;48(2):219–24. DOI: 10.1111/epp.12472
Lau HY, Botella JR. Advanced DNA-based point-of-care diagnostic methods for plant diseases detection. Front Plant Sci. 2017;8:2016. https://doi.org/10.3389/fpls.2017.02016.
Grosdidier M, Aguayo J, Marçais B, Ioos R. Detection of plant pathogens using real-time PCR: how reliable are late Ct values? Plant Pathol. 2017;66(3):359–67. [cited 2022 Jul 11]. Available from: https://onlinelibrary.wiley.com/doi/full/10.1111/ppa.12591.
Moonen P, Boonstra J, Hakze- Van Der Honing R, Boonstra- Leendertse C, Jacobs L, Dekker A. Validation of a LightCycler-based reverse transcription polymerase chain reaction for the detection of foot-and-mouth disease virus. J Virol Methods. 2003;113(1):35–41.
Watzinger F, Ebner K, Lion T. Detection and monitoring of virus infections by real-time PCR. Mol Aspects Med. 2006;27(2–3):254–98. DOI: 10.1016/j.mam.2005.12.001
Martínez M, de Viedma DG, Alonso M, Andrés S, Bouza E, Cabezas T, et al. Impact of laboratory cross-contamination on molecular epidemiology studies of tuberculosis. J Clin Microbiol. 2006;44(8):2967–9. [cited 2021 Oct 26]. Available from: https://doi.org/10.1128/jcm.00754-06.
Bukowska-Ośko I, Perlejewski K, Nakamura S, Motooka D, Stokowy T, Kosińska J, et al. Sensitivity of next-generation sequencing metagenomic analysis for detection of RNA and DNA viruses in cerebrospinal fluid: The confounding effect of background contamination. Adv Exp Med Biol. 2017;944:53–62. [cited 2022 Jul 11]. Available from: https://link.springer.com/chapter/10.1007/5584_2016_42.
Gauthier MEA, Lelwala R v, Elliott CE, Windell C, Fiorito S, Dinsdale A, et al. Side-by-side comparison of post-entry quarantine and high throughput sequencing methods for virus and viroid diagnosis. Biology. 2022;11(2):263. [cited 2022 Feb 14]. Available from: https://www.mdpi.com/2079-7737/11/2/263.
Bloom JS, Sathe L, Munugala C, Jones EM, Gasperini M, Lubock NB, et al. Swab-Seq: a high-throughput platform for massively scaled up SARS-CoV-2 testing. medRxiv. 2021;2020.08.04.20167874.
Ballenghien M, Faivre N, Galtier N. Patterns of cross-contamination in a multispecies population genomic project: detection, quantification, impact, and solutions. BMC Biol. 2017;15(1). [cited 2021 Oct 26]. Available from: https://doi.org/10.1186/s12915-017-0366-6.
Costello M, Fleharty M, Abreu J, Farjoun Y, Ferriera S, Holmes L, et al. Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms. BMC Genomics. 2018;19(1):1–10. Available from: https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-018-4703-0.
Champlot S, Berthelot C, Pruvost M, Andrew Bennett E, Grange T, Geigl EM. An efficient multistrategy DNA decontamination procedure of PCR reagents for hypersensitive PCR applications. PLoS One. 2010;5(9):e13042. Available from: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0013042.
Massart S, Lebas B, Chabirand A, Chappé AM, Dreo T, Faggioli F, et al. Guidelines for improving statistical analyses of validation datasets for plant pest diagnostic tests. EPPO Bulletin. 2022;52(2):419–33. Available from: https://onlinelibrary.wiley.com/doi/full/10.1111/epp.12862.
Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019;20(1):762302. [cited 2020 Jan 14]. Available from: https://www.biorxiv.org/content/10.1101/762302v1.
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10. [cited 2022 Jul 12]. Available from: https://pubmed.ncbi.nlm.nih.gov/20003500/.
Sukhorukov G, Khalili M, Gascuel O, Candresse T, Marais-Colombel A, Nikolski M. VirHunter: A Deep Learning-Based Method for Detection of Novel RNA Viruses in Plant Sequencing Data. Front Bioinform. 2022;2:867111. 10.3389/fbinf.2022.867111.
Lefebvre M, Theil S, Ma Y, Candresse T. The VirAnnot pipeline: a resource for automated viral diversity estimation and operational taxonomy units assignation for virome sequencing data. Peer J. https://doi.org/101094/PBIOMES-07-19-0037-A. 2019;3(4):256–9. [cited 2021 Oct 25]. Available from: https://apsjournals.apsnet.org/doi/abs/10.1094/PBIOMES-07-19-0037-A.
Zheng Y, Gao S, Padmanabhan C, Li R, Galvez M, Gutierrez D, et al. VirusDetect: an automated pipeline for efficient virus discovery using deep sequencing of small RNAs. Virology. 2017;500:130–8. [cited 2021 Oct 25]. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0042682216303166.
Ison J, Kalaš M, Jonassen I, Bolser D, Uludag M, McWilliam H, et al. EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats. Bioinformatics. 2013;29(10):1325–32. [cited 2021 Oct 25]. Available from: https://academic.oup.com/bioinformatics/article/29/10/1325/255660.
Low AJ, Koziol AG, Manninger PA, Blais B, Carrillo CD. ConFindr: rapid detection of intraspecies and cross-species contamination in bacterial whole-genome sequence data. PeerJ. 2019;7(5):e6995. [cited 2021 Feb 11]. Available from: https://doi.org/10.7717/peerj.6995.
Orakov A, Fullam A, Coelho LP, Khedkar S, Szklarczyk D, Mende DR, et al. GUNC: detection of chimerism and contamination in prokaryotic genomes. Genome Biol. 2021;22(1):1–19. [cited 2021 Jul 24]. Available from: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02393-0.3
Simion P, Belkhir K, François C, Veyssier J, Rink JC, Manuel M, et al. A software tool “CroCo” detects pervasive cross-species contamination in next generation sequencing data. BMC Biol. 2018;16(1):1–9. DOI: 10.1186/s12915-018-0486-7
Sangiovanni M, Granata I, Thind AS, Guarracino MR. From trash to treasure: detecting unexpected contamination in unmapped NGS data. BMC Bioinformatics. 2019;20(Suppl 4). [cited 2021 Oct 25]. Available from: https://doi.org/10.1186/s12859-019-2684-x.
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60. [cited 2022 Apr 28]. Available from: https://pubmed.ncbi.nlm.nih.gov/19451168/.
Lu J, Breitwieser FP, Thielen P, Salzberg SL. Bracken: estimating species abundance in metagenomics data. PeerJ Comput Sci. 2017;2017(1):e104. [cited 2022 Jul 12]. Available from: https://peerj.com/articles/cs-104.
Kechin A, Boyarskikh U, Kel A, Filipenko M. CutPrimers: a new tool for accurate cutting of primers from reads of targeted next generation sequencing. J Comput Biol. 2017;24(11):1138–43. DOI: 10.1089/cmb.2017.0096
De Clerck C, Crew K, Van den houwe I, McMichael L, Berhal C, Lassois L, et al. Lessons learned from the virus indexing of Musa germplasm: insights from a multiyear collaboration. Ann Appl Biol. 2017;171(1):15–27.
Marais A, Faure C, Bergey B, Candresse T. Viral double-stranded RNAs (dsRNAs) from plants: alternative nucleic acid substrates for high-throughput sequencing. Methods Mol Biol. 2018;1746:45–53. [cited 2021 Nov 19]. Available from: https://link.springer.com/protocol/10.1007/978-1-4939-7683-6_4.
Chabannes M, Gabriel M, Aksa A, Galzi S, Dufayard JF, Iskra-Caruana ML, et al. Badnaviruses and banana genomes: a long association sheds light on Musa phylogeny and origin. Mol Plant Pathol. 2021;22(2):216–30. DOI: 10.1111/mpp.13019
Ricciuti E, Laboureau N, Noumbissié G, Chabannes M, Sukhikh N, Pooggin MM, et al. Extrachromosomal viral DNA produced by transcriptionally active endogenous viral elements in non-infected banana hybrids impedes quantitative PCR diagnostics of banana streak virus infections in banana hybrids. J Gen Virol. 2021;102(11):001670. [cited 2021 Nov 19]. Available from: https://www.microbiologyresearch.org/content/journal/jgv/10.1099/jgv.0.001670.
Bal A, Pichon M, Picard C, Casalegno JS, Valette M, Schuffenecker I, et al. Quality control implementation for universal characterization of DNA and RNA viruses in clinical respiratory samples using single metagenomic next-generation sequencing workflow. BMC Infect Dis. 2018;18(1):1–10. [cited 2021 Oct 25]. Available from: https://link.springer.com/articles/10.1186/s12879-018-3446-5.
Li CX, Li W, Zhou J, Zhang B, Feng Y, Xu CP, et al. High resolution metagenomic characterization of complex infectomes in paediatric acute respiratory infection. Sci Rep. 2020;10(1):1–11. [cited 2021 Oct 25]. Available from: https://www.nature.com/articles/s41598-020-60992-6.
Boheemen S van, Rijn AL van, Pappas N, Carbo EC, Vorderman RHP, Sidorov I. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID- research that is available on the COVID-19 resource centre - including this with acknowledgement of the origin. 2020;(January).
Bushnell B, Rood J, Singer E. BBMerge - Accurate paired shotgun read merging via overlap. PLoS One. 2017;12(10). 10.1371/journal.pone.0185056.