Transferable deep generative modeling of intrinsically disordered protein conformations-Reference-Cited by-同舟云学术

Transferable deep generative modeling of intrinsically disordered protein conformations

Published:2024-05-23 Issue:5 Volume:20 Page:e1012144
ISSN:1553-7358
Container-title:PLOS Computational Biology
language:en
Short-container-title:PLoS Comput Biol

Author:

Janson Giacomo,Feig Michael^ORCID

Abstract

Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods. Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles. However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data. Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles. The approach, named idpSAM, is a latent diffusion model based on transformer neural networks. It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space. IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model. Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set. Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization. We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.

Funder

National Institute of General Medical Sciences

Publisher

Public Library of Science (PLoS)

Reference86 articles.

1. Simultaneous quantification of protein order and disorder;P Sormanni;Nat Chem Biol,2017

2. X-rays in the cryo-electron microscopy era: structural biology’s dynamic future;SC Shoemaker;Biochemistry,2018

3. Advancing structural biology through breakthroughs in AI.;L Aithani;Curr Op Struct Biol,2023

4. Highly accurate protein structure prediction with AlphaFold;J Jumper;Nature,2021

5. Fuzziness in protein interactions—a historical perspective;M. Fuxreiter;J Mol Biol,2018

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Data Science for Integrated Dynamic Structural Biology—the 21st IUPAB Congress session summary commentary;Biophysical Reviews;2024-09-02