Chromosomes; Phylogeny; Salinity; Oryza/genetics; Genome, Plant; Genome; Oryza; Statistics and Probability; Information Systems; Education; Computer Science Applications; Statistics, Probability and Uncertainty; Library and Information Sciences
Abstract :
[en] Oryza coarctata (2n = 4X = 48, KKLL) is an allotetraploid, undomesticated relative of rice and the only species in the genus Oryza with tolerance to high salinity and submergence. Therefore, it contains important stress and tolerance genes/factors for rice. The initial draft genome published was limited by data and technical restrictions, leading to an incomplete and highly fragmented assembly. This study reports a new, highly contiguous chromosome-level genome assembly and annotation of O. coarctata. PacBio high-quality HiFi reads generated 460 contigs with a total length of 573.4 Mb and an N50 of 23.1 Mb, which were assembled into scaffolds with Hi-C data, anchoring 96.99% of the assembly onto 24 chromosomes. The genome assembly comprises 45,571 genes, and repetitive content contributes 25.5% of the genome. This study provides the novel identification of the KK and LL genome types of the genus Oryza, leading to valuable insights into rice genome evolution. The chromosome-level genome assembly of O. coarctata is a valuable resource for rice research and molecular breeding.
Disciplines :
Agriculture & agronomy
Author, co-author :
Zhao, Hang ✱; Université de Liège - ULiège > TERRA Research Centre ; Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
Wang, Wenzheng ✱; Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
Yang, Yirong ✱; Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
Wang, Zhiwei ; Université de Liège - ULiège > Gembloux Agro-Bio Tech > Gembloux Agro-Bio Tech ; Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
Sun, Jing; Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
Yuan, Kaijun; Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, 100081, China ; Duke university, Durham, USA
Rabbi, S M Hisam Al; Bangladesh Rice Research Institute, Gazipur, 1701, Bangladesh
Khanam, Munnujan; Bangladesh Rice Research Institute, Gazipur, 1701, Bangladesh
Kabir, Md Shahjahan; Bangladesh Rice Research Institute, Gazipur, 1701, Bangladesh
Seraj, Zeba I; Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh
Rahman, Md Sazzadur ✱; Bangladesh Rice Research Institute, Gazipur, 1701, Bangladesh. sazzadur.phys@brri.gov.bd
Zhang, Zhiguo ✱; Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, 100081, China. zhangzhiguo@caas.cn
✱ These authors have contributed equally to this work.
Language :
English
Title :
A high-quality chromosome-level wild rice genome of Oryza coarctata.
Ministry of Science and Technology of the People's Republic of China
Funding text :
This research was funded by National Key Research and Development Program of China (2022YFF1001700). We thank Dr. Hongbing Liu for his comments and suggestions of improvement to the manuscript.
Chowrasia, S. et al. Oryza coarctata roxb. The wild Oryza genomes, 87–104 (2018).
Bansal, J., Gupta, K., Rajkumar, M. S., Garg, R. & Jain, M. Draft genome and transcriptome analyses of halophyte rice Oryza coarctata provide resources for salinity and submergence stress response factors. Physiol Plant 173, 1309–1322 (2021).
Chowrasia, S., Nishad, J., Pandey, R. & Mondal, T. K. Oryza coarctata is a triploid plant with initial events of C4 photosynthesis evolution. Plant Sci 308, 110878 (2021).
Bal, A. & Dutt, S. Mechanism of salt tolerance in wild rice (Oryza coarctata Roxb). Plant and soil 92, 399–404 (1986).
Sengupta, S. & Majumder, A. L. Porteresia coarctata (Roxb.) Tateoka, a wild rice: a potential model for studying salt‐stress biology in rice. Plant, cell & environment 33, 526–542 (2010).
Lu, B. R. & Ge, S. Oryza coarctata: the name that best reflects the relationships of Porteresia coarctata (Poaceae: Oryzeae). Nordic Journal of Botany 23, 555–558 (2003).
Lu, F. et al. Comparative sequence analysis of MONOCULM1-orthologous regions in 14 Oryza genomes. Proceedings of the National Academy of Sciences 106, 2071–2076 (2009).
Mondal, T. K., Rawal, H. C., Gaikwad, K., Sharma, T. R. & Singh, N. K. First de novo draft genome sequence of Oryza coarctata, the only halophytic species in the genus Oryza. F1000Res 6, 1750 (2017).
Mondal, T. K. et al. Draft genome sequence of first monocot-halophytic species Oryza coarctata reveals stress-specific genes. Sci Rep 8, 13698 (2018).
Lieberman-Aiden, E. et al. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 326, 289–293 (2009).
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nature Biotechnology 31, 1119−+ (2013).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences of the United States of America 117, 9451–9457 (2020).
Bao, Z. R. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Research 12, 1269–1276 (2002).
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, I351–I358 (2005).
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. Bmc Bioinformatics 9 (2008).
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Research 35, W265–W268 (2007).
Ou, S. J. & Jiang, N. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiology 176, 1410–1422 (2018).
Bao, W.D., Kojima, K.K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6 (2015).
Neumann, P., Novak, P., Hostakova, N. & Macas, J. Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mobile DNA 10 (2019).
Wheeler, T. J. et al. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Research 41, D70–D82 (2013).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics Chapter 4, 4 10 1–4 10 14 (2009).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27, 573–80 (1999).
Beier, S., Thiel, T., Munch, T., Scholz, U. & Mascher, M. MISA-web: a web server for microsatellite prediction. Bioinformatics 33, 2583–2585 (2017).
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–44 (2008).
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
Keilwagen, J. et al. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res 44, e89 (2016).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12, 357–60 (2015).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33, 290–5 (2015).
Tang, S., Lomsadze, A. & Borodovsky, M. Identification of protein coding regions in RNA transcripts. Nucleic Acids Res 43, e78 (2015).
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology 29, 644–U130 (2011).
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research 31, 5654–5666 (2003).
Haas, B.J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biology 9 (2008).
de, A.G.I.g.t.o.g.g. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. nature 408, 796–815 (2000).
Chen, J. et al. Whole-genome sequencing of Oryza brachyantha reveals mechanisms underlying Oryza genome evolution. Nature Communications 4, 1595 (2013).
Stein, J. C. et al. Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza. Nature genetics 50, 285–296 (2018).
Goff, S. A. et al. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296, 92–100 (2002).
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25, 955–64 (1997).
Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Research 33, D121–D124 (2005).
Loman, T. A novel method for predicting ribosomal RNA genes in prokaryotic genomes. (2017).
Griffiths-Jones, S., Grocock, R. J., van Dongen, S., Bateman, A. & Enright, A. J. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34, D140–4 (2006).
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).
She, R., Chu, J. S. C., Wang, K., Pei, J. & Chen, N. S. genBlastA: Enabling BLAST to identify homologous gene sequences. Genome Research 19, 143–149 (2009).
Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. Genome Research 14, 988–995 (2004).
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Research 47, D309–D314 (2019).
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Research 44, D457–D462 (2016).
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research 31, 365–370 (2003).
Finn, R. D. et al. Pfam: clans, web tools and services. Nucleic Acids Research 34, D247–D251 (2006).
Jia, K. H. et al. SubPhaser: a robust allopolyploid subgenome phasing method based on subgenome-specific k-mers. New Phytol 235, 801–809 (2022).
Goel, M., Sun, H., Jiao, W.-B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome biology 20, 1–13 (2019).
Chen, M. et al. Genome Warehouse: a public repository housing genome-scale data. Genomics, proteomics & bioinformatics 19, 584–589 (2021).