Abstract
AbstractFitness conferred by the same allele may differ between genotypes, and these differences shape variation and evolution. Changes in amino acid propensities at protein sites over the course of evolution have been inferred from sequence alignments statistically, but the existing methods are data-intensive and aggregate multiple sites. Here, we develop an approach to detect individual amino acids that confer different fitness in different groups of species from combined sequence and phylogenetic data. Using the fact that the probability of a substitution to an amino acid depends on its fitness, our method looks for amino acids such that substitutions to them occur more frequently in one group of lineages than in another. We validate our method using simulated evolution of a protein site under different scenarios and show that it has high specificity for a wide range of assumptions regarding the underlying changes in selection, while its sensitivity differs between scenarios. We apply our method to the env gene of two HIV-1 subtypes, A and B, and to the HA gene of two influenza A subtypes, H1 and H3, and show that the inferred fitness changes are consistent with the fitness differences observed in deep mutational scanning experiments. We find that changes in relative fitness of different amino acid variants within a site do not always trigger episodes of positive selection and therefore may not result in an overall increase in the frequency of substitutions, but can still be detected from changes in relative frequencies of different substitutions.Author summaryWhich amino acids are acceptable at a certain protein site can change with time. In viruses, for example, this can be due to changes in mechanisms of drug resistance and immune escape in the course of evolution. Here, we develop a method for detecting such changes from how evolutionary events are distributed over an evolutionary tree. Informally, we infer that a certain amino acid is favored in a certain group of lineages if substitutions giving rise to it repeatedly occur in the evolution of this group, and disfavored if such substitutions are rare. In surface proteins of HIV-1 and influenza A, we find that changes in preferences detected with d-test match those observed in deep mutational scanning experiments. Our purely bioinformatic approach allows inference of changes in selection between lineages from sequences alone, shedding light on the functional differences between strains or species even in the absence of any structural or functional data.
Publisher
Cold Spring Harbor Laboratory