Abstract
AbstractCompensating substitutions happen when one mutation is advantageously selected because it restores the loss of fitness induced by a previous deleterious mutation. How frequent such mutations occur in evolution and what is the structural and functional context permitting their emergence remain open questions. We built an atlas of intra-protein compensatory substitutions using a phylogenetic approach and a dataset of 1,630 bacterial protein families for which high-quality sequence alignments and experimentally derived protein structures were available. We identified more than 51,000 positions coevolving by the mean of predicted compensatory mutations. Using the evolutionary and structural properties of the analyzed positions, we demonstrate that compensatory mutations are scarce (typically only a few in the protein history) but widespread (the majority of proteins experienced at least one). Typical coevolving residues are evolving slowly, are located in the protein core outside secondary structure motifs, and are more often in contact than expected by chance, even after accounting for their evolutionary rate and solvent exposure. An exception to this general scheme are residues coevolving for charge compensation, which are evolving faster than non-coevolving sites, in contradiction with predictions from simple coevolutionary models, but similar to stem pairs in RNA. While sites with a significant pattern of coevolution by compensatory mutations are rare, the comparative analysis of hundreds of structures ultimately permits a better understanding of the link between the three-dimensional structure of a protein and its fitness landscape.
Publisher
Cold Spring Harbor Laboratory
Reference60 articles.
1. Adams M. 2017. lm.br: Linear Model with Breakpoint. Available from: https://CRAN.R-project.org/package=lm.br
2. Fitting Linear Mixed-Effects Models Using lme4;Journal of Statistical Software,2015
3. Testing for Independence between Evolutionary Processes
4. The Protein Data Bank
5. Online synonymous codon usage analyses with the ade4 and seqinR packages