Abstract :
[en] Gossypium genus divides into eight diploid genomic groups, where the differential amplification of lineage-specific transposable elements (TEs) underlies the three-fold genome size variation of the different genomes. Genomic specific TEs propagation accounts for the genome size gap between A and D genomes. Though multiple versions of the genome sequencing assemblies of the A genome, D genome, and tetraploid genomes have been released. It is also challenging to assemble the complete genome avoiding unanchored scaffolds remaining, due to the massive repetitive sequences scattered around the whole genome. Especially in the tetraploid cotton, homologous fragments from A- and D-sub genome are difficult to be distinguished and correctly anchored in genomes. The research on genome-specific TEs between A and D genomes can uncover the mechanisms of cotton genome speciation and evolution, assist tetraploid genome assembling and potentially artificial apply them in genome editing. Fluorescence in situ hybridization (FISH) is a versatile tool to visualize the distribution of sequences in chromosomes and plays a vital role in recent cytogenetic research. More and more repetitive sequences in the cotton genome were reported recently and identified with FISH. Combining with bioinformatics analysis, we identified and characterized two genome-specific repetitive sequences from Gossypium A and D genomes respectively. This will promote a better understanding of the mechanism of genome evolution, and support genome-specific markers to facilitate genome accurately assembling. The main contents and results are as follows:
(1) Discovery and Annotation of a Novel Transposable Element Family in Gossypium Fluorescence in situ hybridization (FISH) is an efficient cytogenetic technology to study chromosome structure. Transposable element (TE) is an important component in eukaryotic genomes and can provide insights in the structure and evolution of eukaryotic genomes. A FISH probe derived from bacterial artificial chromosome (BAC) clone 299N22 generated brilliant signals on all 26 chromosomes of the cotton diploid A genome (AA, 2x=26) but very few on the diploid D genome (DD, 2x=26). All 26 chromosomes of the A sub genome (At) of tetraploid cotton (AADD, 2n=4x=52) also gave positive signals with this FISH probe, whereas very few signals were observed on the D sub genome (Dt). Sequencing and annotation of BAC clone 299N22, revealed a novel Ty3/gypsy transposon family, which was named as CICR. This family is a significant contributor to size expansion in the A (sub) genome but not in the D (sub) genome. Further FISH analysis with the LTR of CICR as a probe revealed that CICR is lineage-specific, since massive repeats were found in A and B genomic groups, but not in C–G genomic groups within the Gossypium genus. Molecular evolutionary analysis of CICR suggested that the transposon family silenced in 1–1.5 million years ago (MYA), when the tetraploid cottons formed. Furthermore, A genomes are more homologous with B genomes, and the C, E, F, and G genomes likely became divided from a common ancestor prior to 3.5–4 MYA, the time when CICR appeared. The genomic variation caused by the insertion of CICR in the A (sub) genome may have played an important role in the speciation of organisms with A genomes. The CICR family is highly repetitive in A and B genomes of Gossypium, but absent in the C–G genomes. The differential amount of CICR family in At and Dt will aid in partitioning sub genome sequences for chromosome assemblies during tetraploid genome sequencing and will act as a method for assessing the accuracy of tetraploid genomes by looking at the proportion of CICR elements in resulting pseudochromosome sequences. The timeline of the expansion of CICR family provides a new reference for cotton evolutionary analysis, while the impact on gene function caused by the insertion of CICR elements will be a target for further analysis of investigating phenotypic differences between A genome and D genome species.
(2) Identification of a Genome-specific Repetitive Element in the Gossypium D genome The activity of genome-specific repetitive sequences is the main cause of genome variation between Gossypium A and D genomes. Through comparative analysis of the two genomes, we retrieved a repetitive element termed ICRd motif, which appears frequently in the diploid Gossypium raimondii (D5) genome but rarely in the diploid Gossypium arboreum (A2) genome. We further explored the existence of the ICRd motif in chromosomes of G. raimondii, G. arboreum, and two tetraploid (AADD) cotton species, Gossypium hirsutum and Gossypium barbadense, by fluorescence in situ hybridization (FISH), and observed that the ICRd motif exists in the D5 and D-subgenomes but not in the A2 and A-subgenomes. The ICRd motif comprises two components, a variable tandem repeat (TR) region and a conservative sequence (CS). The two constituents each have hundreds of repeats that evenly distribute across 13 chromosomes of the D5 genome. The ICRd motif (and its repeats) was revealed as the common conservative region harbored by ancient Long Terminal Repeat Retrotransposons. Identification and investigation of the ICRd motif promotes the study of A and D genome differences, facilitates research on Gossypium genome evolution, and provides assistance to subgenome identification and genome assembling.