<scp>IFF</scp>: Identifying key residues in intrinsically disordered regions of proteins using machine learning-Reference-Cited by-同舟云学术

IFF: Identifying key residues in intrinsically disordered regions of proteins using machine learning

Published:2023-08-22 Issue:9 Volume:32 Page:
ISSN:0961-8368
Container-title:Protein Science
language:en
Short-container-title:Protein Science

Author:

Ho Wen‐Lin¹,Huang Hsuan‐Cheng²,Huang Jie‐rong¹²³^ORCID

Affiliation:

1. Institute of Biochemistry and Molecular Biology, National Yang Ming Chiao Tung University Taipei Taiwan

2. Institute of Biomedical Informatics, National Yang Ming Chiao Tung University Taipei Taiwan

3. Department of Life Sciences and Institute of Genome Sciences National Yang Ming Chiao Tung University Taipei Taiwan

Abstract

AbstractConserved residues in protein homolog sequence alignments are structurally or functionally important. For intrinsically disordered proteins or proteins with intrinsically disordered regions (IDRs), however, alignment often fails because they lack a steric structure to constrain evolution. Although sequences vary, the physicochemical features of IDRs may be preserved in maintaining function. Therefore, a method to retrieve common IDR features may help identify functionally important residues. We applied unsupervised contrastive learning to train a model with self‐attention neuronal networks on human IDR orthologs. Parameters in the model were trained to match sequences in ortholog pairs but not in other IDRs. The trained model successfully identifies previously reported critical residues from experimental studies, especially those with an overall pattern (e.g., multiple aromatic residues or charged blocks) rather than short motifs. This predictive model can be used to identify potentially important residues in other proteins, improving our understanding of their functions. The trained model can be run directly from the Jupyter Notebook in the GitHub repository using Binder (mybinder.org). The only required input is the primary sequence. The training scripts are available on GitHub (https://github.com/allmwh/IFF). The training datasets have been deposited in an Open Science Framework repository (https://osf.io/jk29b).

Funder

National Science and Technology Council

Publisher

Wiley

Subject

Molecular Biology,Biochemistry

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/pro.4739

Reference67 articles.

1. Considerations and Challenges in Studying Liquid-Liquid Phase Separation and Biomolecular Condensates

2. Biomolecular condensates at the nexus of cellular stress, protein aggregation disease and ageing

3. End-to-End Differentiable Learning of Protein Structure

4. OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more

5. Accurate prediction of protein structures and interactions using a three-track neural network