Author:
Beckel Maximiliano,Kaufman Bruno,Yanovsky Marcelo,Chernomoretz Ariel
Abstract
AbstractDespite the fact that the main steps of the splicing process are similar across eukaryotes, differences in splicing factors, gene architecture and sequence divergences in splicing signals suggest clade-specific features of splicing and its regulation.In this work we study conserved and divergent signatures embedded in the sequence composition of eukaryotic 5’ splicing sites. We considered a regularized maximum entropy modeling framework to mine for non-trivial two-site correlations in donor sequences of 14 different eukaryote organisms. Our approach allowed us to accommodate and extend in a unified framework many of the regularities observed in previous works, like the relationship between the frequency of occurrence of natural sequences and the corresponding site’s strength, or the negative epistatic effects between exonic and intronic consensus sites. In addition, performing a systematic and comparative analysis of 5’ss we showed that lineage information could be traced not only from single-site frequencies but also from joint di-nucleotide probabilities of donor sequences. Noticeably, we could also identify specific two-site coupling patterns for plants and for animals and argue that these differences, in association with taxon-specific features involving U6 snRNP, could be the basis for differences in splicing regulation previously reported between these groups.
Publisher
Cold Spring Harbor Laboratory