Abstract
AbstractGene paralogs are copies of an ancestral gene that appear after gene or full genome duplication. When the two sister gene copies are maintained in the genome, redundancy may release certain evolutionary pressures, allowing one of them to access novel gene functions. Here we focused on the evolutionary history of the three polypyrimidine tract binding protein (PTBP) paralogs and their concurrent evolution of differential codon usage preferences in vertebrate species.PTBP1-3 show high identity at the amino acid level (up to 80%), but display strongly different nucleotide composition, divergent CUPrefs and distinct tissue-specific expression levels. Phylogenetic inference suggests that the duplication events leading to the three extant PTBP1-3 lineages predate the basal diversification within vertebrates. We identify a distinct substitution pattern towards GC3-enriching mutations in PTBP1, concurrent with a trend for the use of common codons and for a tissue-wide expression. Genomic context analysis shows that GC3-rich nucleotide composition for PTBP1s is driven by local mutational processes. In contrast, PTBP2s are enriched in AT-ending, rare codons, and display tissue-restricted expression. Nucleotide composition and CUPrefs of PTBP2 are only partly driven by local mutational forces, and could have been shaped by selective forces. Interestingly, trends for use of UUG-Leu codon match those of AT-ending codons.Our interpretation is that a combination of directional mutation–selection has differentially shaped CUPrefs of PTBPs in Vertebrates: GC-enrichment of PTBP1 is linked to the strong and broad tissue expression, while AT-enrichment of PTBP2 and PTBP3 are linked to rare CUPrefs and specialized spatio-temporal expression. This scenario is compatible with a gene subfunctionalisation process by differential expression regulation associated to the evolution of specific CUPrefs.1Significance StatementIn vertebrates, PTBP paralogs display strong differences in gene composition, gene expression regulation, and their expression in cell culture depends on their codon usage preferences. We show that placental mammals PTBP1 have become GC-rich because of local mutational pressures, resulting in an enrichment of frequently used codons and in a strong, tissue-wide expression. On the contrary, PTBP2 in vertebrates are AT-rich, with a lower contribution of local mutational processes to their specific nucleotide composition, show high frequency of rare codons and in placental mammals display a restricted expression pattern contrasting to that of PTBP1. The systematic study of composition and expression patterns of gene paralogs can help understand the complex mutation-selection interplay that shape codon usage bias in multicellular organisms.
Publisher
Cold Spring Harbor Laboratory