Abstract
AbstractAncestral sequence reconstruction (ASR) is a phylogenetic method widely used to analyze the properties of ancient biomolecules and to elucidate the mechanisms of molecular evolution. By recapitulating the structural, mechanistic, and functional changes of proteins during their evolution, ASR has been able to address many fundamental and challenging evolutionary questions where more traditional methods have failed. Despite the tangible successes of ASR, the accuracy of its reconstructions is currently unknown, because it is generally impossible to compare resurrected proteins to the true ancient ancestors. Which evolutionary models are the best for ASR? How accurate are the resulting inferences? Here we answer these questions by using cross-validation (CV) methods to investigate and assess the influence of different evolutionary models on ASR.To evaluate the adequacy of a chosen evolutionary model for predicting extant sequence data, our column-wise CV method iteratively cross-validates each column in an alignment. Unlike other phylogenetic model selection criteria, this method does not require bias correction and does not make restrictive assumptions commonly violated by phylogenetic data. We find that column-wise CV generally provides a more conservative criterion than the AIC by preferring less complex models.To validate ASR methods, we also use cross-validation to reconstruct each extant sequence in an alignment with ASR methodology, a method we term “extant sequence reconstruction” (ESR). We thus evaluate the accuracy of ASR methodology by comparing ESR reconstructions to the corresponding true sequences. We find that a common measure of the quality of a reconstructed sequence, the average probability, is indeed a good estimate of the fraction of correct amino acids when the evolutionary model is accurate or overparameterized. However, the average probability is a poor measure for comparing reconstructions from different models, because a more accurate phylogenetic model often results in reconstructions with lower probability. While better models may produce reconstructions with lower sequence identity to the true sequences, better models nevertheless produce reconstructions that are more biophysically similar to true ancestors. Hence, model selection is critical for improving reconstruction quality. ESR is a powerful method for validating evolutionary models used for ASR and can be applied in practice to any phylogenetic analysis of real biological sequences.
Publisher
Cold Spring Harbor Laboratory
Reference57 articles.
1. A New Method of Inference of Ancestral Nucleotide and Amino Acid Sequences;Genetics,1995
2. Evolutionary drivers of thermoadaptation in enzyme catalysis;Science,2017
3. Boucher, J. I. ; Jacobowitz, J. R. ; Beckett, B. C. ; Classen, S. ; Theobald, D. L. , An atomic-resolution view of neofunctionalization in the evolution of apicomplexan lactate dehydrogenases. Elife 2014, 3.
4. Evolution of cyclohexadienyl dehydratase from an ancestral solute-binding protein;Nature Chemical Biology,2018
5. Evolution of chalcone isomerase from a noncatalytic ancestor
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献