Evaluation of Genomic Contamination Detection Tools and Influence of Horizontal Gene Transfer on Their Efficiency through Contamination Simulations at Various Taxonomic Ranks
Industrial and Manufacturing Engineering; Metals and Alloys; Strategy and Management; Mechanical Engineering
Abstract :
[en] Genomic contamination remains a pervasive challenge in (meta)genomics, prompting the development of numerous detection tools. Despite the attention that this issue has attracted, a comprehensive comparison of the available tools is absent from the literature. Furthermore, the potential effect of horizontal gene transfer on the detection of genomic contamination has been little studied. In this study, we evaluated the efficiency of detection of six widely used contamination detection tools. To this end, we developed a simulation framework using orthologous group inference as a robust basis for the simulation of contamination. Additionally, we implemented a variable mutation rate to simulate horizontal transfer. Our simulations covered six distinct taxonomic ranks, ranging from phylum to species. The evaluation of contamination levels revealed the suboptimal precision of the tools, attributed to significant cases of both over-detection and under-detection, particularly at the genus and species levels. Notably, only so-called “redundant” contamination was reliably estimated. Our findings underscore the necessity of employing a combination of tools, including Kraken2, for accurate contamination level assessment. We also demonstrate that none of the assayed tools confused contamination and horizontal gene transfer. Finally, we release CRACOT, a freely accessible contamination simulation framework, which holds promise in evaluating the efficacy of future algorithms.
Disciplines :
Microbiology
Author, co-author :
Cornet, Luc ; Université de Liège - ULiège > Département des sciences de la vie > Phylogénomique des eucaryotes ; BCCM/IHEM, Mycology and Aerobiology, Sciensano, 1050 Brussels, Belgium ; BCCM/MUCL and Laboratory of Mycology, Earth and Life Institute, Université Catholique de Louvain, 1348 Louvain-la-Neuve, Belgium
Lupo, Valérian ; Université de Liège - ULiège > Département des sciences de la vie > Phylogénomique des eucaryotes ; BCCM/MUCL and Laboratory of Mycology, Earth and Life Institute, Université Catholique de Louvain, 1348 Louvain-la-Neuve, Belgium
Declerck, Stéphane; BCCM/MUCL and Laboratory of Mycology, Earth and Life Institute, Université Catholique de Louvain, 1348 Louvain-la-Neuve, Belgium
Baurain, Denis ; Université de Liège - ULiège > Département des sciences de la vie > Phylogénomique des eucaryotes
Language :
English
Title :
Evaluation of Genomic Contamination Detection Tools and Influence of Horizontal Gene Transfer on Their Efficiency through Contamination Simulations at Various Taxonomic Ranks
Publication date :
10 January 2024
Journal title :
Applied Microbiology
ISSN :
0003-6919
Publisher :
MDPI AG
Volume :
4
Issue :
1
Pages :
124-132
Peer reviewed :
Peer Reviewed verified by ORBi
Tags :
CÉCI : Consortium des Équipements de Calcul Intensif
Cornet L. Baurain D. Contamination Detection in Genomic Data: More Is Not Enough Genome Biol. 2022 23 60 10.1186/s13059-022-02619-9 35189924
Schierwater B. Eitel M. Jakob W. Osigus H.-J. Hadrys H. Dellaporta S.L. Kolokotronis S.-O. DeSalle R. Concatenated Analysis Sheds Light on Early Metazoan Evolution and Fuels a Modern “Urmetazoon” Hypothesis PLoS Biol. 2009 7 e20 10.1371/journal.pbio.1000020 19175291
Philippe H. Brinkmann H. Lavrov D.V. Littlewood D.T.J. Manuel M. Wörheide G. Baurain D. Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough PLoS Biol. 2011 9 e1000602 10.1371/journal.pbio.1000602 21423652
Laurin-Lemay S. Brinkmann H. Philippe H. Origin of Land Plants Revisited in the Light of Sequence Contamination and Missing Data Curr. Biol. 2012 22 R593 R594 10.1016/j.cub.2012.06.013 22877776
Lupo V. Van Vlierberghe M. Vanderschuren H. Kerff F. Baurain D. Cornet L. Contamination in Reference Sequence Databases: Time for Divide-and-Rule Tactics Front. Microbiol. 2021 12 755101 10.3389/fmicb.2021.755101
Parks D.H. Imelfort M. Skennerton C.T. Hugenholtz P. Tyson G.W. CheckM: Assessing the Quality of Microbial Genomes Recovered from Isolates, Single Cells, and Metagenomes Genome Res. 2015 25 1043 1055 10.1101/gr.186072.114
Manni M. Berkeley M.R. Seppey M. Simao F.A. Zdobnov E.M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes arXiv 2021 2106.11799 10.1093/molbev/msab199
Orakov A. Fullam A. Coelho L.P. Khedkar S. Szklarczyk D. Mende D.R. Schmidt T.S.B. Bork P. GUNC: Detection of Chimerism and Contamination in Prokaryotic Genomes Genome Biol. 2021 22 178 10.1186/s13059-021-02393-0
Wood D.E. Lu J. Langmead B. Improved Metagenomic Analysis with Kraken 2 Genome Biol. 2019 20 257 10.1186/s13059-019-1891-0
Cornet L. Meunier L. Vlierberghe M.V. Léonard R.R. Durieu B. Lara Y. Misztak A. Sirjacobs D. Javaux E.J. Philippe H. et al. Consensus Assessment of the Contamination Level of Publicly Available Cyanobacterial Genomes PLoS ONE 2018 13 e0200323 10.1371/journal.pone.0200323
Chklovski A. Parks D.H. Woodcroft B.J. Tyson G.W. CheckM2: A Rapid, Scalable and Accurate Tool for Assessing Microbial Genome Quality Using Machine Learning Nat. Methods 2022 20 1203 1212 10.1038/s41592-023-01940-w 37500759
Federhen S. The NCBI Taxonomy Database Nucleic Acids Res. 2012 40 D136 D143 10.1093/nar/gkr1178 22139910
Schoch C.L. Ciufo S. Domrachev M. Hotton C.L. Kannan S. Khovanskaya R. Leipe D. Mcveigh R. O’Neill K. Robbertse B. et al. NCBI Taxonomy: A Comprehensive Update on Curation, Resources and Tools Database 2020 2020 baaa062 10.1093/database/baaa062 32761142
Hyatt D. Chen G.-L. LoCascio P.F. Land M.L. Larimer F.W. Hauser L.J. Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification BMC Bioinform. 2010 11 119 10.1186/1471-2105-11-119
Emms D.M. Kelly S. OrthoFinder: Phylogenetic Orthology Inference for Comparative Genomics Genome Biol. 2019 20 238 10.1186/s13059-019-1832-y
Song W. Steensen K. Thomas T. HgtSIM: A Simulator for Horizontal Gene Transfer (HGT) in Microbial Communities PeerJ 2017 5 e4015 10.7717/peerj.4015
Cornet L. Durieu B. Baert F. D’hooge E. Colignon D. Meunier L. Lupo V. Cleenwerck I. Daniel H.-M. Rigouts L. et al. The GEN-ERA Toolbox: Unified and Reproducible Workflows for Research in Microbial Genomics GigaScience 2023 12 giad022 10.1093/gigascience/giad022
Mende D.R. Letunic I. Maistrenko O.M. Schmidt T.S.B. Milanese A. Paoli L. Hernández-Plaza A. Orakov A.N. Forslund S.K. Sunagawa S. et al. proGenomes2: An Improved Database for Accurate and Consistent Habitat, Taxonomic and Functional Annotations of Prokaryotic Genomes Nucleic Acids Res. 2020 48 D621 D625 10.1093/nar/gkz1002
R Core Team R: A Language and Environment for Statistical Computing R Core Team Vienna, Austria 2014
Wickham H. ggplot2: Elegant Graphics for Data Analysis Springer New York, NY, USA 2016 978-3-319-24277-4
Buchfink B. Xie C. Huson D.H. Fast and Sensitive Protein Alignment Using DIAMOND Nat. Methods 2015 12 59 60 10.1038/nmeth.3176
Arnold B.J. Huang I.-T. Hanage W.P. Horizontal Gene Transfer and Adaptive Evolution in Bacteria Nat. Rev. Microbiol. 2021 20 206 218 10.1038/s41579-021-00650-4 34773098
Zhaxybayeva O. Gogarten J.P. Charlebois R.L. Doolittle W.F. Papke R.T. Phylogenetic Analyses of Cyanobacterial Genomes: Quantification of Horizontal Gene Transfer Events Genome Res. 2006 16 1099 1108 10.1101/gr.5322306
Dagan T. Artzy-Randrup Y. Martin W. Modular Networks and Cumulative Impact of Lateral Transfer in Prokaryote Genome Evolution Proc. Natl. Acad. Sci. USA 2008 105 10039 10044 10.1073/pnas.0800679105 18632554
Dagan T. Martin W. Ancestral Genome Sizes Specify the Minimum Rate of Lateral Gene Transfer during Prokaryote Evolution Proc. Natl. Acad. Sci. USA 2007 104 870 875 10.1073/pnas.0606318104
Bohr L.L. Mortimer T.D. Pepperell C.S. Lateral Gene Transfer Shapes Diversity of Gardnerella spp. Front. Cell. Infect. Microbiol. 2020 10 293 10.3389/fcimb.2020.00293
Frazão N. Sousa A. Lässig M. Gordo I. Horizontal Gene Transfer Overrides Mutation in Escherichia Coli Colonizing the Mammalian Gut Proc. Natl. Acad. Sci. USA 2019 116 17906 17915 10.1073/pnas.1906958116
Chen L.-X. Anantharaman K. Shaiber A. Eren A.M. Banfield J.F. Accurate and Complete Genomes from Metagenomes Genome Res. 2020 30 315 333 10.1101/gr.258640.119
Di Tommaso P. Chatzou M. Floden E.W. Barja P.P. Palumbo E. Notredame C. Nextflow Enables Reproducible Computational Workflows Nat. Biotechnol. 2017 35 316 319 10.1038/nbt.3820
Kurtzer G.M. Sochat V. Bauer M.W. Singularity: Scientific Containers for Mobility of Compute PLoS ONE 2017 12 e0177459 10.1371/journal.pone.0177459