Unsupervised learning reveals landscape of local structural motifs across protein classes-Reference-Cited by-同舟云学术

Unsupervised learning reveals landscape of local structural motifs across protein classes

Published:2023-12-05 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Derry Alexander^ORCID,Altman Russ B.^ORCID

Abstract

ABSTRACTProteins are known to share similarities in local regions of 3D structure even across disparate global folds. Such correspondences can help to shed light on functional relationships between proteins and identify conserved local structural features that lead to function. Self-supervised deep learning on large protein structure datasets has produced high-fidelity representations of local structural microenvironments, enabling comparison of local structure and function at scale. In this work, we leverage these representations to cluster over 15 million environments in the Protein Data Bank, resulting in the creation of a “lexicon” of local 3D motifs which form the building blocks of all known protein structures. We characterize these motifs and demonstrate that they provide valuable information for modeling structure and function at all scales of protein analysis, from full protein chains to binding pockets to individual amino acids. We devise a new protein representation based solely on its constituent local motifs and show that this representation enables state-of-the-art performance on protein structure search and model quality assessment. We then show that this approach enables accurate prediction of drug off-target interactions by modeling the similarity between local binding pockets. Finally, we identify structural motifs associated with pathogenic variants in the human proteome by leveraging the predicted structures in the AlphaFold structure database.

Publisher

Cold Spring Harbor Laboratory

Reference57 articles.

1. The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures

2. SCOPe: improvements to the structural classification of proteins – extended database to facilitate variant interpretation and machine learning

3. CATH: increased structural coverage of functional space;Nucleic Acids Res,2021

4. Quantifying the Similarities within Fold Space

5. Connecting the Protein Structure Universe by Using Sparse Recurring Fragments