Abstract
AbstractThe need to maintain protein structure constrains evolution at the sequence level, and patterns of coevolution in homologous protein sequences can be used to predict their 3D structures with high accuracy. Our understanding of the relationship between protein structure and evolution has traditionally been benchmarked by computational models’ ability to predict contacts from a single representative, experimentally determined structure per protein family. However, proteinsin vivoare highly dynamic and can adopt multiple functionally relevant conformations. Here we demonstrate that interactions that stabilize alternate conformations, as well those that mediate conformational changes, impose an underappreciated but significant set of evolutionary constraints. We analyze the extent of these constraints over 56 paralogous G protein coupled receptors (GPCRs),β-arrestin and the human SARS-CoV2 receptor ACE2. Specifically, we observe that contacts uniquely found in molecular dynamics (MD) simulation data and alternate-conformation crystal structures are successfully predicted by unsupervised language models. In GPCRs, adding these contacts as positives increases the percentage of top contacts classified as true positives, as predicted by a state-of-the-art language model, from 69% to 87%. Our results show that protein dynamics impose constraints on molecular evolution and demonstrate the ability of unsupervised language models to measure these constraints.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献