SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model-Reference-Cited by-同舟云学术

SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model

Published:2022-02-01 Issue:7 Volume:38 Page:1888-1894
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Singh Jaspreet¹^ORCID,Litfin Thomas¹,Singh Jaswinder¹^ORCID,Paliwal Kuldip¹,Zhou Yaoqi²³⁴^ORCID

Affiliation:

1. Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University , Brisbane, QLD 4111, Australia

2. Institute for Glycomics, Griffith University , Southport, QLD 4222, Australia

3. Institute of Systems and Physical Biology, Shenzhen Bay Laboratory , Shenzhen 518055, China

4. Peking University Shenzhen Graduate School , Shenzhen 518055, China

Abstract

Abstract Motivation Accurate prediction of protein contact-map is essential for accurate protein structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most methods rely on protein-sequence-evolutionary information, which may not exist for many proteins due to lack of naturally occurring homologous sequences. Moreover, generating evolutionary profiles is computationally intensive. Here, we developed a contact-map predictor utilizing the output of a pre-trained language model ESM-1b as an input along with a large training set and an ensemble of residual neural networks. Results We showed that the proposed method makes a significant improvement over a single-sequence-based predictor SSCpred with 15% improvement in the F1-score for the independent CASP14-FM test set. It also outperforms evolutionary-profile-based methods trRosetta and SPOT-Contact with 48.7% and 48.5% respective improvement in the F1-score on the proteins without homologs (Neff = 1) in the independent SPOT-2018 set. The new method provides a much faster and reasonably accurate alternative to evolution-based methods, useful for large-scale prediction. Availability and implementation Stand-alone-version of SPOT-Contact-LM is available at https://github.com/jas-preet/SPOT-Contact-Single. Direct prediction can also be made at https://sparks-lab.org/server/spot-contact-single. The datasets used in this research can also be downloaded from the GitHub. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

Australia Research Council

Shenzhen Science and Technology Program

Major Program of Shenzhen Bay Laboratory

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btac053/42466318/btac053.pdf

Reference39 articles.

1. ProteinNet: a standardized data set for machine learning of protein structure;AlQuraishi;BMC Bioinformatics,2019

2. The pfam protein families database;Bateman;Nucleic Acids Res,2004

3. SSCpred: single-sequence-based protein contact prediction using deep fully convolutional network;Chen;J. Chem. Inf. Model,2020

4. Estimation of model accuracy in CASP13;Cheng;Proteins Struct. Funct. Bioinf,2019

5. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation;Chicco;BMC Genomics,2020

Cited by 32 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. PCP-GC-LM: single-sequence-based protein contact prediction using dual graph convolutional neural network and convolutional neural network;BMC Bioinformatics;2024-09-02

2. Using Attention-UNet Models to Predict Protein Contact Maps;Journal of Computational Biology;2024-07-01

3. Application of Transformers in Cheminformatics;Journal of Chemical Information and Modeling;2024-05-30

4. Freeprotmap: waiting-free prediction method for protein distance map;BMC Bioinformatics;2024-05-04

5. Transformer Encoder with Protein Language Model for Protein Secondary Structure Prediction;Engineering, Technology & Applied Science Research;2024-04-02