[en] Eukaryotic genes have a mosaic structure of exons and introns. After transcription, precursor mRNAs have to be spliced to yield mature mRNAs suitable for protein synthesis. This process is termed splicing and is carried out by a complex known as the spliceosome. Serine/arginine-rich (SR) proteins play essential roles in splicing. They have a modular organization featuring at least one RNA recognition motif (RRM) domain and a carboxyl-terminal RS region enriched in arginine/serine dipeptides. However, their architectures are quite diverse, which has so far complicated their evolutionary analysis.
To investigate the origin and evolution of SR splicing factors, we inferred phylogenies for more than 12,000 RRM domains representing more than 200 broadly sampled organisms. Our results show that all SR proteins share a single ancient origin. Based on refined analyses, we propose a scenario for their diversification into four natural families and a dozen subfamilies. Altogether, this work confirms the homogeneity and antiquity of SR splicing factors while establishing robust phylogenetic relationships between animal and plant proteins.
In this talk, I will focus on the bioinformatics approaches required to carry out such a large-scale phylogenetic analysis.