Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning-Reference-Cited by-同舟云学术

Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning

Published:2022-06-29 Issue:6 Volume:18 Page:e1010238
ISSN:1553-7358
Container-title:PLOS Computational Biology
language:en
Short-container-title:PLoS Comput Biol

Author:

Lu Alex X.^ORCID,Lu Amy X.^ORCID,Pritišanac Iva,Zarin Taraneh^ORCID,Forman-Kay Julie D.,Moses Alan M.^ORCID

Abstract

A major challenge to the characterization of intrinsically disordered regions (IDRs), which are widespread in the proteome, but relatively poorly understood, is the identification of molecular features that mediate functions of these regions, such as short motifs, amino acid repeats and physicochemical properties. Here, we introduce a proteome-scale feature discovery approach for IDRs. Our approach, which we call “reverse homology”, exploits the principle that important functional features are conserved over evolution. We use this as a contrastive learning signal for deep learning: given a set of homologous IDRs, the neural network has to correctly choose a held-out homolog from another set of IDRs sampled randomly from the proteome. We pair reverse homology with a simple architecture and standard interpretation techniques, and show that the network learns conserved features of IDRs that can be interpreted as motifs, repeats, or bulk features like charge or amino acid propensities. We also show that our model can be used to produce visualizations of what residues and regions are most important to IDR function, generating hypotheses for uncharacterized IDRs. Our results suggest that feature discovery using unsupervised neural networks is a promising avenue to gain systematic insight into poorly understood protein sequences.

Funder

natural sciences and engineering research council of canada

canadian institutes of health research

canada research chairs

Natural Sciences and Engineering Research Council of Canada

Nvidia

Publisher

Public Library of Science (PLoS)

Subject

Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics

Reference125 articles.

1. Intrinsically Disordered Proteins: The Dark Horse of the Dark Proteome;P Kulkarni;Proteomics,2018

2. Classification of intrinsically disordered regions and proteins;R Van Der Lee;Chemical Reviews. American Chemical Society,2014

3. Current Opinion in Structural Biology;NE Davey,2019

4. Intrinsically disordered proteins in cellular signalling and regulation;PE Wright;Nat Rev Mol Cell Biol,2015

Cited by 25 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Beyond monopole electrostatics in regulating conformations of intrinsically disordered proteins;PNAS Nexus;2024-08-27

2. PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions;2024-07-24

3. Scorpio : Enhancing Embeddings to Improve Downstream Analysis of DNA sequences;2024-07-23

4. The TSC22D, WNK, and NRBP gene families exhibit functional buffering and evolved with Metazoa for cell volume regulation;Cell Reports;2024-07

5. Chromosome compaction is triggered by an autonomous DNA-binding module within condensin;Cell Reports;2024-07