The Structure of Evolutionary Model Space for Proteins across the Tree of Life
Author:
Scolaro Gabrielle E.1,
Braun Edward L.1ORCID
Affiliation:
1. Department of Biology, University of Florida, Gainesville, FL 32611, USA
Abstract
The factors that determine the relative rates of amino acid substitution during protein evolution are complex and known to vary among taxa. We estimated relative exchangeabilities for pairs of amino acids from clades spread across the tree of life and assessed the historical signal in the distances among these clade-specific models. We separately trained these models on collections of arbitrarily selected protein alignments and on ribosomal protein alignments. In both cases, we found a clear separation between the models trained using multiple sequence alignments from bacterial clades and the models trained on archaeal and eukaryotic data. We assessed the predictive power of our novel clade-specific models of sequence evolution by asking whether fit to the models could be used to identify the source of multiple sequence alignments. Model fit was generally able to correctly classify protein alignments at the level of domain (bacterial versus archaeal), but the accuracy of classification at finer scales was much lower. The only exceptions to this were the relatively high classification accuracy for two archaeal lineages: Halobacteriaceae and Thermoprotei. Genomic GC content had a modest impact on relative exchangeabilities despite having a large impact on amino acid frequencies. Relative exchangeabilities involving aromatic residues exhibited the largest differences among models. There were a small number of exchangeabilities that exhibited large differences in comparisons among major clades and between generalized models and ribosomal protein models. Taken as a whole, these results reveal that a small number of relative exchangeabilities are responsible for much of the structure of the “model space” for protein sequence evolution. The clade-specific models we generated may be useful tools for protein phylogenetics, and the structure of evolutionary model space that they revealed has implications for phylogenomic inference across the tree of life.
Subject
General Agricultural and Biological Sciences,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology
Reference98 articles.
1. Bryson, V., and Vogel, H.J. (1965). Evolving Genes and Proteins, Academic Press.
2. The chemical meaning of amino acid mutations;Dayhoff;Atlas of Protein Sequence and Structure,1969
3. On some principles governing molecular evolution;Kimura;Proc. Natl. Acad. Sci. USA,1974
4. GenBank;Sayers;Nucleic Acids Res.,2021
5. UniProt Consortium (2021). UniProt: The universal protein knowledgebase in 2021. Nucleic Acids Res., 49, D480–D489.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Phylogenomics using Compression Distances: Incorporating Rate Heterogeneity and Amino Acid Properties;Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics;2023-09-03