Deep Generative Models of Protein Structure Uncover Distant Relationships Across a Continuous Fold Space

Author:

Draizen Eli J.ORCID,Veretnik StellaORCID,Mura CameronORCID,Bourne Philip E.ORCID

Abstract

AbstractOur views of fold space implicitly rest upon many assumptions that impact how we analyze, interpret and understand biological systems—from protein structure comparison and classification to function prediction and evolutionary analyses. For instance, is there an optimal granularity at which to view protein structural similarities (e.g., architecture, topology or some other level)? If so, how does it vary with the type of question being asked? Similarly, the discrete/continuous dichotomy of fold space is central in structural bioinformatics, but remains unresolved. Discrete views of fold space bin ‘similar’ folds into distinct, non-overlapping groups; unfortunately, such binning may inherently miss many remote relationships. While hierarchical databases like CATH, SCOP and ECOD represent major steps forward in protein classification, a scalable, objective and conceptually flexible method, with less reliance on assumptions and heuristics, could enable a more systematic and nuanced exploration of fold space, particularly as regards evolutionarily-distant relationships. Building upon a recent ‘Urfold’ model of protein structure, we have developed a new approach to analyze protein structure relationships. Termed ‘DeepUrfold’, this method is rooted in deep generative modeling via variational Bayesian inference, and we find it to be useful for comparative analysis across the protein universe. Critically, DeepUrfold leverages its deep generative model’s embeddings, which represent a distilled, lower-dimensional space of a given protein and its amalgamation of sequence, structure and biophysical properties. Notably, DeepUrfold is structure-guided, versus being purely structure-based, and its architecture allows each trained model to learn protein features (structural and otherwise) that, in a sense, ‘define’ different superfamilies. Deploying DeepUrfold with CATH suggests a new, mostly-continuous view of fold space—a view that extends beyond simple 3D structural/geometric similarity, towards the realm of integratedsequencestructurefunctionproperties. We find that such an approach can quantitatively represent and detect evolutionarily-remote relationships that evade existing methods.AvailabilityOur detailed results can be explored athttps://bournelab.org/research/DeepUrfold/; the DeepUrfold code is available athttp://www.github.com/bouralab/DeepUrfoldand data are available athttps://doi.org/10.5281/zenodo.6916524.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3