NetSurfP-3.0: accurate and fast prediction of protein structural features by protein language models and deep learning

Author:

Høie Magnus Haraldson1ORCID,Kiehl Erik Nicolas1,Petersen Bent23ORCID,Nielsen Morten1ORCID,Winther Ole456ORCID,Nielsen Henrik1ORCID,Hallgren Jeppe7ORCID,Marcatili Paolo1ORCID

Affiliation:

1. Department of Health Technology, Technical University of Denmark , DK Lyngby, Denmark

2. Center for Evolutionary Hologenomics, GLOBE Institute, University of Copenhagen , Denmark

3. Centre of Excellence for Omics-Driven Computational Biodiscovery (COMBio), Faculty of Applied Sciences, AIMST University , Kedah, Malaysia

4. Section for Cognitive Systems, DTU Compute, Technical University of Denmark (DTU) , Denmark

5. Center for Genomic Medicine, Rigshospitalet (Copenhagen University Hospital) , Copenhagen, Denmark

6. Department of Biology, Bioinformatics Centre, University of Copenhagen , Copenhagen, Denmark

7. BioLib Technologies , Copenhagen, Denmark

Abstract

Abstract Recent advances in machine learning and natural language processing have made it possible to profoundly advance our ability to accurately predict protein structures and their functions. While such improvements are significantly impacting the fields of biology and biotechnology at large, such methods have the downside of high demands in terms of computing power and runtime, hampering their applicability to large datasets. Here, we present NetSurfP-3.0, a tool for predicting solvent accessibility, secondary structure, structural disorder and backbone dihedral angles for each residue of an amino acid sequence. This NetSurfP update exploits recent advances in pre-trained protein language models to drastically improve the runtime of its predecessor by two orders of magnitude, while displaying similar prediction performance. We assessed the accuracy of NetSurfP-3.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features, with a runtime that is up to to 600 times faster than the most commonly available methods performing the same tasks. The tool is freely available as a web server with a user-friendly interface to navigate the results, as well as a standalone downloadable package.

Funder

Sino-Danish Center

Publisher

Oxford University Press (OUP)

Subject

Genetics

Reference24 articles.

1. High-accuracy protein structure prediction in CASP14;Pereira;Proteins,2021

2. Highly accurate protein structure prediction with AlphaFold;Jumper;Nature,2021

3. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models;Varadi;Nucleic Acids Res.,2022

4. UniProt: the universal protein knowledgebase in 2021;UniProt Consortium;Nucleic Acids Res.,2021

5. PHD–an automatic mail server for protein secondary structure prediction;Rost;Comput. Applic. Biosci.,1994

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3