Theory of measurement for site-specific evolutionary rates in amino-acid sequences-Reference-Cited by-同舟云学术

Theory of measurement for site-specific evolutionary rates in amino-acid sequences

Published:2018-09-07 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Sydykova Dariya K.^ORCID,Wilke Claus O.^ORCID

Abstract

In the field of molecular evolution, we commonly calculate site-specific evolutionary rates from alignments of amino-acid sequences. For example, catalytic residues in enzymes and interface regions in protein complexes can be inferred from observed relative rates. While numerous approaches exist to calculate amino-acid rates, it is not entirely clear what physical quantities the inferred rates represent and how these rates relate to the underlying fitness landscape of the evolving proteins. Further, amino-acid rates can be calculated in the context of different amino-acid exchangeability matrices, such as JTT, LG, or WAG, and again it is not well understood how the choice of the matrix influences the physical inter-pretation of the inferred rates. Here, we develop a theory of measurement for site-specific evolutionary rates, by analytically solving the maximum-likelihood equations for rate inference performed on sequences evolved under a mutation–selection model. We demonstrate that for realistic analysis settings the measurement process will recover the true expected rates of the mutation–selection model if rates are measured relative to a naïve exchangeability matrix, in which all exchangeabilities are equal to 1/19. We also show that rate measurements using other matrices are quantitatively close but in general not mathematically equivalent. Our results demonstrate that insights obtained from phylogenetic-tree inference do not necessarily apply to rate inference, and best practices for the former may be deleterious for the latter.Significance StatementMaximum likelihood inference is widely used to infer model parameters from sequence data in an evolutionary context. One major challenge in such inference procedures is the problem of having to identify the appropriate model used for inference. Model parameters usually are meaningful only to the extent that the model is appropriately specified and matches the process that generated the data. However, in practice, we don’t know what process generated the data, and most models in actual use are misspecified. To circumvent this problem, we show here that we can employ maximum likelihood inference to make defined and meaningful measurements on arbitrary processes. Our approach uses misspecification as a deliberate strategy, and this strategy results in robust and meaningful parameter inference.

Publisher

Cold Spring Harbor Laboratory

Reference55 articles.

1. Causes of evolutionary rate variation among protein sites

2. Functional Sites Induce Long-Range Evolutionary Constraints in Enzymes

3. Structure, function, and evolution of transient and obligate protein-protein interactions

4. Mutation and evolution at the molecular level;Genetics,1973

5. The Pattern of Amino Acid Replacements in α/β-Barrels

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Genomic remnants of ancestral methanogenesis and hydrogenotrophy in Archaea drive anaerobic carbon cycling;Science Advances;2022-11-04

2. Genomic remnants of ancestral hydrogen and methane metabolism in Archaea drive anaerobic carbon cycling;2021-08-02

3. Large-Scale Analyses of Site-Specific Evolutionary Rates across Eukaryote Proteomes Reveal Confounding Interactions between Intrinsic Disorder, Secondary Structure, and Functional Domains;Genes;2018-11-14