[en] BACKGROUND: The advancement of sequencing technologies results in the rapid release of hundreds of new genome assemblies a year providing unprecedented resources for the study of genome evolution. Within this context, the significance of in-depth analyses of repetitive elements, transposable elements (TEs) in particular, is increasingly recognized in understanding genome evolution. Despite the plethora of available bioinformatic tools for identifying and annotating TEs, the phylogenetic distance of the target species from a curated and classified database of repetitive element sequences constrains any automated annotation effort. Moreover, manual curation of raw repeat libraries is deemed essential due to the frequent incompleteness of automatically generated consensus sequences.
RESULTS: Here, we present an example of a crowd-sourcing effort aimed at curating and annotating TE libraries of two non-model species built around a collaborative, peer-reviewed teaching process. Manual curation and classification are time-consuming processes that offer limited short-term academic rewards and are typically confined to a few research groups where methods are taught through hands-on experience. Crowd-sourcing efforts could therefore offer a significant opportunity to bridge the gap between learning the methods of curation effectively and empowering the scientific community with high-quality, reusable repeat libraries.
CONCLUSIONS: The collaborative manual curation of TEs from two tardigrade species, for which there were no TE libraries available, resulted in the successful characterization of hundreds of new and diverse TEs in a reasonable time frame. Our crowd-sourcing setting can be used as a teaching reference guide for similar projects: A hidden treasure awaits discovery within non-model organisms.
Disciplines :
Genetics & genetic processes Zoology
Author, co-author :
Peona, Valentina; Department of Organismal Biology - Systematic Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, SE-752 36, Sweden. valentina.peona@nrm.se ; Swiss Ornithological Institute Vogelwarte, Sempach, CH-6204, Switzerland. valentina.peona@nrm.se ; Department of Bioinformatics and Genetics, Swedish Natural History Museum, Stockholm, Sweden. valentina.peona@nrm.se
Martelossi, Jacopo; Department of Biological Geological and Environmental Science, University of Bologna, Via Selmi 3, Bologna, 40126, Italy. jacopo.martelossi2@unibo.it
Almojil, Dareen; New York University Abu Dhabi, Saadiyat Island, United Arab Emirates
Bocharkina, Julia; Skolkovo Institute of Science and Technology, Moscow, Russia
Brännström, Ioana; Natural History Museum, Oslo University, Oslo, Norway ; Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
Brown, Max; Anglia Ruskin University, East Rd, Cambridge, CB1 1PT, UK
Cang, Alice; University of Arizona, Tucson, AZ, USA
Carrasco-Valenzuela, Tomàs; Evolutionary Genetics Department, Leibniz Institute for Zoo and Wildlife Research, 10315, Berlin, Germany ; Berlin Center for Genomics in Biodiversity Research, 14195, Berlin, Germany
DeVries, Jon; Reed College, Portland, OR, United States of America
Doellman, Meredith; Department of Ecology and Evolution, The University of Chicago, Chicago, IL, 60637, USA ; Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, 46556, USA
Elsner, Daniel; Evolutionary Biology & Ecology, University of Freiburg, Freiburg, Germany
Espíndola-Hernández, Pamela; Research Unit Comparative Microbiome Analysis (COMI), Helmholtz Zentrum München, Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany
Montoya, Guillermo Friis; Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE, UK
Gaspar, Bence; Institute of Evolution and Ecology, University of Tuebingen, Tuebingen, Germany
Zagorski, Danijela; Institute of Botany, Czech Academy of Sciences, Průhonice, Czech Republic
Hałakuc, Paweł; Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland
Ivanovska, Beti; Institute of Genetics and Biotechnology, Hungarian University of Agriculture and Life Sciences, Budapest, Hungary
Laumer, Christopher; The Natural History Museum, Cromwell Road, London, SW6 7SJ, UK
Lehmann, Robert; Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Boštjančić, Ljudevit Luka; LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Senckenberganlage 25, 60325, Frankfurt, Germany
Mashoodh, Rahia; Department of Genetics, Environment & Evolution, Centre for Biodiversity & Environment Research, University College London, London, UK
Mazzoleni, Sofia; Department of Ecology, Faculty of Science, Charles University, Prague, Czech Republic
Mouton, Alice ; Université de Liège - ULiège > Département des sciences et gestion de l'environnement (Arlon Campus Environnement) > Socio-économie, Environnement et Développement (SEED)
Nilsson, Maria Anna; LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Senckenberganlage 25, 60325, Frankfurt, Germany
Pei, Yifan; Department of Organismal Biology - Systematic Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, SE-752 36, Sweden ; Centre for Molecular Biodiversity Research, Leibniz Institute for the Analysis of Biodiversity Change, Adenauerallee 127, 53113, Bonn, Germany
Potente, Giacomo; Department of Systematic and Evolutionary Botany, University of Zurich, Zurich, Switzerland
Provataris, Panagiotis; German Cancer Research Center, NGS Core Facility, DKFZ-ZMBH Alliance, 69120, Heidelberg, Germany
Pardos-Blas, José Ramón; Departamento de Biodiversidad y Biología Evolutiva, Museo Nacional de Ciencias Naturales (MNCN-CSIC), José Gutiérrez Abascal 2, Madrid, 28006, Spain
Raut, Ravindra; Department of Biotechnology, National Institute of Technology Durgapur, Durgapur, India
Sbaffi, Tomasa; Molecular Ecology Group (MEG), National Research Council of Italy - Water Research Institute (CNR-IRSA), Verbania, Italy
Schwarz, Florian; Eurofins Genomics Europe Pharma and Diagnostics Products & Services Sales GmbH, Ebersberg, Germany
Stapley, Jessica; Plant Pathology Group, Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
Stevens, Lewis; Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
Sultana, Nusrat; Department of Botany, Jagannath Univerity, Dhaka, 1100, Bangladesh
Symonova, Radka; Institute of Hydrobiology, Biology Centre of the Czech Academy of Sciences, České Budějovice, Czech Republic
Tahami, Mohadeseh S; Department of Biological and Environmental Science, University of Jyväskylä, P.O. Box 35, Jyväskylä, 40014, Finland
Urzì, Alice; Centogene GmbH, Am Strande 7, 18055, Rostock, Germany
Yang, Heidi; Department of Ecology & Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA, United States of America
Yusuf, Abdullah; Zell- und Molekularbiologie der Pflanzen, Technische Universität Dresden, Dresden, Germany
Suh, Alexander; Department of Organismal Biology - Systematic Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, SE-752 36, Sweden. a.suh@leibniz-lib.de ; School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TU, UK. a.suh@leibniz-lib.de ; Present address: Centre for Molecular Biodiversity Research, Leibniz Institute for the Analysis of Biodiversity Change, Adenauerallee 160, 53113, Bonn, Germany. a.suh@leibniz-lib.de
Open access funding provided by Uppsala University. This study was supported by grants from Swedish Research Council Vetenskapsr\u00E5det (2020\u201304436 to AS; 2022\u201306195 to VP), the Swedish Research Council Formas (2017\u2009\u2212\u200901597 to AS), the Canziani bequest and the \u2018Ricerca Fondamentale Orientata\u2019 (RFO) funding from the University of Bologna to JM.Part of the analysis were performed on resources provided by the Swedish National Infrastructure for Computing (SNIC) through Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX), National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725 and CSC-IT Finland. We thank the three anonymous reviewers and Irina Arkhipova for their useful and detailed comments on the manuscript.
Osmanski AB, Paulat NS, Korstian J, Grimshaw JR, Halsey M, Sullivan KAM et al. Insights into mammalian TE diversity through the curation of 248 genome assemblies. Science (1979) [Internet]. 2023;380:eabn1430. https://doi.org/10.1126/science.abn1430.
Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA [Internet]. 2015;6:11. https://doi.org/10.1186/s13100-015-0041-9.
Wicker T. The repetitive landscape of the chicken genome. Genome Res [Internet]. 2004;15:126–36. http://genome.cshlp.org/content/15/1/126.abstract.
Hillier LW, Miller W, Birney E, Warren W, Hardison RC, Ponting CP et al. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature [Internet]. 2004;432:695–716. https://doi.org/10.1038/nature03154.
Boman J, Frankl-Vilches C, da Silva dos Santos M, de Oliveira EHC, Gahr M, Suh A. The Genome of Blue-Capped Cordon-Bleu Uncovers Hidden Diversity of LTR Retrotransposons in Zebra Finch. Genes (Basel) [Internet]. 2019;10:301. https://www.mdpi.com/2073-4425/10/4/301.
Kapusta A, Suh A, Feschotte C. Dynamics of genome size evolution in birds and mammals. Proc Natl Acad Sci U S A [Internet]. 2017;114:E1460–9. http://www.pnas.org/content/114/8/E1460.abstract.
Sproul J, Hotaling S, Heckenhauer J, Powell A, Marshall D, Larracuente AM et al. 600 + insect genomes reveal repetitive element dynamics and highlight biodiversity-scale repeat annotation challenges. Genome Res [Internet]. 2023; http://genome.cshlp.org/content/early/2023/09/22/gr.277387.122.abstract.
Platt RN, Blanco-Berdugo L, Ray DA. Accurate transposable element annotation is vital when analyzing new genome assemblies. Genome Biol Evol [Internet]. 2016;8:403–10. https://doi.org/10.1093/gbe/evw009.
Peona V, Blom MPK, Xu L, Burri R, Sullivan S, Bunikis I et al. Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise. Mol Ecol Resour [Internet]. 2021;21:263–86. https://doi.org/10.1111/1755-0998.13252.
Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences [Internet]. 2020;117:9451–7. https://doi.org/10.1073/pnas.1921046117.
Zeng L, Kortschak RD, Raison JM, Bertozzi T, Adelson DL. Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies. PLoS One [Internet]. 2018;13:e0193588-. https://doi.org/10.1371/journal.pone.0193588.
Quesneville H, Bergman CM, Andrieu O, Autard D, Nouaud D, Ashburner M et al. Combined Evidence Annotation of Transposable Elements in Genome Sequences. PLoS Comput Biol [Internet]. 2005;1:e22-. https://doi.org/10.1371/journal.pcbi.0010022.
Goubert C, Craig RJ, Bilat AF, Peona V, Vogan AA, Protasio AV. A beginner’s guide to manual curation of transposable elements. Mob DNA [Internet]. 2022;13:7. https://doi.org/10.1186/s13100-021-00259-7.
Storer JM, Hubley R, Rosen J, Smit AFA. Curation Guidelines for de novo Generated Transposable Element Families. Curr Protoc [Internet]. 2021;1:e154. https://doi.org/10.1002/cpz1.154.
Elliott TA, Heitkam T, Hubley R, Quesneville H, Suh A, Wheeler TJ et al. TE Hub: A community-oriented space for sharing and connecting tools, data, resources, and methods for transposable element annotation. Mob DNA [Internet]. 2021;12:16. https://doi.org/10.1186/s13100-021-00244-0.
Leung W, Shaffer CD, Chen EJ, Quisenberry TJ, Ko K, Braverman JM et al. Retrotransposons Are the Major Contributors to the Expansion of the Drosophila ananassae Muller F Element. G3 Genes|Genomes|Genetics [Internet]. 2017;7:2439–60. https://doi.org/10.1534/g3.117.040907.
Moya ND, Stevens L, Miller IR, Sokol CE, Galindo JL, Bardas AD et al. Novel and improved Caenorhabditis briggsae gene models generated by community curation. BMC Genomics. 2023;24. https://link.springer.com/article/10.1186/s12864-023-09582-0.
Chang WH, Mashouri P, Lozano AX, Johnstone B, Husić M, Olry A et al. Phenotate: crowdsourcing phenotype annotations as exercises inundergraduate classes. Genetics in Medicine [Internet]. 2020;22:1391–400. https://doi.org/10.1038/s41436-020-0812-7.
Zhou N, Siegel ZD, Zarecor S, Lee N, Campbell DA, Andorf CM et al. Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning. PLoS Comput Biol [Internet]. 2018;14:e1006337-. https://doi.org/10.1371/journal.pcbi.1006337.
Singh M, Bhartiya D, Maini J, Sharma M, Singh AR, Kadarkaraisamy S et al. The Zebrafish GenomeWiki: a crowdsourcing approach to connect the long tail for zebrafish gene annotation. Database [Internet]. 2014;2014:bau011. https://doi.org/10.1093/database/bau011.
Prost S, Winter S, De Raad J, Coimbra RTF, Wolf M, Nilsson MA et al. Education in the genomics era: Generating high-quality genome assemblies in university courses. Gigascience [Internet]. 2020;9:giaa058. https://doi.org/10.1093/gigascience/giaa058.
Prost S, Petersen M, Grethlein M, Hahn SJ, Kuschik-Maczollek N, Olesiuk ME et al. Improving the Chromosome-Level Genome Assembly of the Siamese Fighting Fish (Betta splendens) in a University Master’s Course. G3 Genes|Genomes|Genetics [Internet]. 2020;10:2179–83. https://doi.org/10.1534/g3.120.401205.
Yoshida Y, Koutsovoulos G, Laetsch DR, Stevens L, Kumar S, Horikawa DD et al. Comparative genomics of the tardigrades Hypsibius dujardini and Ramazzottius varieornatus. Tyler-Smith C, editor. PLoS Biol [Internet]. 2017;15:e2002266. https://doi.org/10.1371/journal.pbio.2002266.
Møbjerg N, Halberg KA, Jørgensen A, Persson D, Bjørn M, Ramløv H et al. Survival in extreme environments – on the current knowledge of adaptations in tardigrades. Acta Physiologica [Internet]. 2011;202:409–20. https://doi.org/10.1111/j.1748-1716.2011.02252.x.
Peter D, Bertolani R, Guidetti R. Actual checklist of Tardigrada species. 2019.
Yuan JY, Finney M, Tsung N, Horvitz HR. Tc4, a Caenorhabditis elegans transposable element with an unusual fold-back structure. Proceedings of the National Academy of Sciences. 1991;88:3334–8.
Giribet G, Edgecombe GD. Current Understanding of Ecdysozoa and its Internal Phylogenetic Relationships. Integr Comp Biol [Internet]. 2017;57:455–66. https://doi.org/10.1093/icb/icx072.
Peona V, Kutschera VE, Blom MPK, Irestedt M, Suh A. Satellite DNA evolution in Corvoidea inferred from short and long reads. Mol Ecol [Internet]. 2022;0–64. https://onlinelibrary.wiley.com/doi/ https://doi.org/10.1111/mec.16484.
Baril T, Galbraith J, Hayward A. Earl Grey: a fully automated user-friendly transposable element annotation and analysis pipeline. Mol Biol Evol [Internet]. 2024;41:msae068. https://academic.oup.com/mbe/article/41/4/msae068/7635926.
Panta M, Mishra A, Hoque MT, Atallah J. ClassifyTE: a stacking-based prediction of hierarchical classification of transposable elements. Bioinformatics [Internet]. 2021;37:2529–36. https://doi.org/10.1093/bioinformatics/btab146.
Orozco-Arias S, Lopez-Murillo LH, Piña JS, Valencia-Castrillon E, Tabares-Soto R, Castillo-Ossa L et al. Genomic object detection: An improved approach for transposable elements detection and classification using convolutional neural networks. PLoS One [Internet]. 2023;18:e0291925-. https://doi.org/10.1371/journal.pone.0291925.
Bickmann L, Rodriguez M, Jiang X, Makalowski W. TEclass2: Classification of transposable elements using Transformers. bioRxiv [Internet]. 2023;2023.10.13.562246. http://biorxiv.org/content/early/2023/10/16/2023.10.13.562246.abstract.
Orozco-Arias S, Isaza G, Guyot R, Tabares-Soto R. A systematic review of the application of machine learning in the detection and classification of transposable elements. Nakai K, editor. PeerJ [Internet]. 2019;7:e8311. https://doi.org/10.7717/peerj.8311.
J.M. Flynn R. Hubley C. Goubert J. Rosen A.G. Clark C. Feschotte et al. RepeatModeler2 for automated genomic discovery of transposable element families Proc Natl Acad Sci U S A 2020 117 9451 7 1:CAS:528:DC%2BB3cXnvFeqt74%3D 10.1073/pnas.1921046117 32300014 7196820
T. Wicker F. Sabot A. Hua-Van J.L. Bennetzen P. Capy B. Chalhoub et al. A unified classification system for eukaryotic transposable elements Nat Rev Genet 2007 8 973 82 1:CAS:528:DC%2BD2sXhtlajtrnF 10.1038/nrg2165 17984973
T. Flutre E. Duprat C. Feuillet H. Quesneville Considering transposable element diversification in De Novo Annotation Approaches PLoS ONE 2011 6 e16526 1:CAS:528:DC%2BC3MXhvFaht7w%3D 10.1371/journal.pone.0016526 21304975 3031573
Smit AFA, Hubley R, Green P. RepeatMasker Open-4.0 [Internet]. 2015. http://www.repeatmasker.org.
C. Camacho G. Coulouris V. Avagyan N. Ma J. Papadopoulos K. Bealer et al. BLAST+: Architecture and applications BMC Bioinformatics 2009 10 421 10.1186/1471-2105-10-421 20003500 2803857
K. Katoh J. Rozewicki K.D. Yamada MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization Brief Bioinform 2018 20 1160 6 10.1093/bib/bbx108
Suh A, Smeds L, Ellegren H. Abundant recent activity of retrovirus-like retrotransposons within and among flycatcher species implies a rich source of structural variation in songbird genomes. Mol Ecol [Internet]. 2018;27:99–111. https://doi.org/10.1111/mec.14439.
V.V. Kapitonov J. Jurka A universal classification of eukaryotic transposable elements implemented in Repbase Nat Rev Genet 2008 9 411 2 10.1038/nrg2165-c1 18421312
O. Kohany A.J. Gentles L. Hankus J. Jurka Annotation, submission and screening of repetitive elements in repbase: RepbaseSubmitter and Censor BMC Bioinformatics 2006 7 474 10.1186/1471-2105-7-474 17064419 1634758
C. Feschotte E.J. Pritham DNA transposons and the evolution of eukaryotic genomes Annu Rev Genet 2007 41 331 68 1:CAS:528:DC%2BD1cXns1Sisw%3D%3D 10.1146/annurev.genet.40.110405.090448 18076328 2167627
A. Marchler-Bauer S. Lu J.B. Anderson F. Chitsaz M.K. Derbyshire C. DeWeese-Scott et al. CDD: a conserved domain database for the functional annotation of proteins Nucleic Acids Res 2011 39 D225 9 1:CAS:528:DC%2BC3sXivF2ltb8%3D 10.1093/nar/gkq1189 21109532
A. Marchler-Bauer S.H. Bryant CD-Search: protein domain annotations on the fly Nucleic Acids Res 2004 32 W327 31 1:CAS:528:DC%2BD2cXlvFKntb4%3D 10.1093/nar/gkh454 15215404 441592
S. Lu J. Wang F. Chitsaz M.K. Derbyshire R.C. Geer N.R. Gonzales et al. CDD/SPARCLE: the conserved domain database in 2020 Nucleic Acids Res 2020 48 D265 8 1:CAS:528:DC%2BB3cXhs1GltrjM 10.1093/nar/gkz991 31777944
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K et al. BLAST+: Architecture and applications. BMC Bioinformatics [Internet]. 2009;10:421. https://doi.org/10.1186/1471-2105-10-421.
B.Q. Minh H.A. Schmidt O. Chernomor D. Schrempf M.D. Woodhams A. Von Haeseler et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era Mol Biol Evol 2020 37 1530 4 1:CAS:528:DC%2BB3cXis1egsLbL 10.1093/molbev/msaa015 32011700 7182206
D.T. Hoang O. Chernomor A. Von Haeseler B.Q. Minh L.S. Vinh UFBoot2: improving the ultrafast bootstrap approximation Mol Biol Evol 2018 35 518 22 1:CAS:528:DC%2BC1cXitlyjs7rK 10.1093/molbev/msx281 29077904
C. Notredame D.G. Higgins J. Heringa T-Coffee: a novel method for fast and accurate multiple sequence alignment J Mol Biol 2000 302 205 17 1:CAS:528:DC%2BD3cXmtVGntr8%3D 10.1006/jmbi.2000.4042 10964570
Flutre T, Duprat E, Feuillet C, Quesneville H. Considering Transposable Element Diversification in De Novo Annotation Approaches. PLoS One [Internet]. 2011;6:e16526. https://doi.org/10.1371/journal.pone.0016526.