Author:
Beaulieu Jeremy M.,O’Meara Brian C.,Zaretzki Russell,Landerer Cedric,Chai Juanjuan,Gilchrist Michael A.
Abstract
AbstractWe present a new phylogenetic approach SelAC (Selection on Amino acids and Codons), whose substitution rates are based on a nested model linking protein expression to population genetics. Unlike simpler codon models which assume a single substitution matrix for all sites, our model more realistically represents the evolution of protein coding DNA under the assumption of consistent, stabilizing selection using cost-benefit approach. This cost-benefit approach allows us generate a set of 20 optimal amino acid specific matrix families using just a handful of parameters and naturally links the strength of stabilizing selection to protein synthesis levels, which we can estimate. Using a yeast dataset of 100 orthologs for 6 taxa, we find SelAC fits the data much better than popular models by 104–105 AICc units. Our results indicate there is great potential for more accurate inference of phylogenetic trees and branch lengths from already existing data through the use of nested, mechanistic models. Additional parameters estimated by SelAC indicate that a large amount of non-phylogenetic, but biologically meaningful, information can be inferred from exisiting data. For example, SelAC prediction of gene specific protein synthesis rates correlates well with both empirical (r=0.33−0.48) and other theoretical predictions (r=0.45−0.64) for multiple yeast species. SelAC also provides estimates of the optimal amino acid at each site. Finally, because SelAC is a nested approach based on clearly stated biological assumptions, future modifications, such as including shifts in the optimal amino acid sequence within or across lineages, are possible.
Publisher
Cold Spring Harbor Laboratory