Comprehensive assessment of protein loop modeling programs on large-scale datasets: prediction accuracy and efficiency-Reference-Cited by-同舟云学术

Comprehensive assessment of protein loop modeling programs on large-scale datasets: prediction accuracy and efficiency

Published:2023-11-22 Issue:1 Volume:25 Page:
ISSN:1467-5463
Container-title:Briefings in Bioinformatics
language:en
Short-container-title:

Author:

Wang Tianyue¹,Wang Langcheng²,Zhang Xujun¹,Shen Chao¹^ORCID,Zhang Odin¹,Wang Jike¹,Wu Jialu¹,Jin Ruofan³,Zhou Donghao⁴,Chen Shicheng¹,Liu Liwei⁵,Wang Xiaorui⁶^ORCID,Hsieh Chang-Yu¹,Chen Guangyong⁷,Pan Peichen¹^ORCID,Kang Yu¹^ORCID,Hou Tingjun¹^ORCID

Affiliation:

1. College of Pharmaceutical Sciences, Zhejiang University , Hangzhou 310058, Zhejiang , China

2. Department of Pathology, New York University Medical Center , 550 First Avenue, New York, NY 10016 , USA

3. College of Life Sciences, Zhejiang University , Hangzhou 310058, Zhejiang , China

4. Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences , Shenzhen 518055, Guangdong , China

5. Advanced Computing and Storage Laboratory, Central Research Institute , 2012 Laboratories, Huawei Technologies Co., Ltd., Shenzhen 518129, Guangdong , China

6. State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology , Macao , China

7. Zhejiang Lab, Zhejiang University , Hangzhou 311121, Zhejiang , China

Abstract

Abstract Protein loops play a critical role in the dynamics of proteins and are essential for numerous biological functions, and various computational approaches to loop modeling have been proposed over the past decades. However, a comprehensive understanding of the strengths and weaknesses of each method is lacking. In this work, we constructed two high-quality datasets (i.e. the General dataset and the CASP dataset) and systematically evaluated the accuracy and efficiency of 13 commonly used loop modeling approaches from the perspective of loop lengths, protein classes and residue types. The results indicate that the knowledge-based method FREAD generally outperforms the other tested programs in most cases, but encountered challenges when predicting loops longer than 15 and 30 residues on the CASP and General datasets, respectively. The ab initio method Rosetta NGK demonstrated exceptional modeling accuracy for short loops with four to eight residues and achieved the highest success rate on the CASP dataset. The well-known AlphaFold2 and RoseTTAFold require more resources for better performance, but they exhibit promise for predicting loops longer than 16 and 30 residues in the CASP and General datasets. These observations can provide valuable insights for selecting suitable methods for specific loop modeling tasks and contribute to future advancements in the field.

Funder

National Key Research and Development Program of China

National Natural Science Foundation of China

Fundamental Research Funds for the Central Universities

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/bib/article-pdf/25/1/bbad486/54943752/bbad486.pdf

Reference86 articles.

1. Fast protein loop sampling and structure prediction using distance-guided sequential chain-growth Monte Carlo method;Tang;PLoS Comput Biol,2014

2. Current approaches to flexible loop modeling;Barozet;Curr Res Struct Biol,2021

3. Dynameomics: data-driven methods and models for utilizing large-scale protein structure repositories for improving fragment-based loop prediction;Rysavy;Protein Sci,2014

4. G-quadruplex conformation and dynamics are determined by loop length and sequence;Tippana;Nucleic Acids Res,2014

5. Structure and dynamics of GPCR signaling complexes;Hilger;Nat Struct Mol Biol,2018

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Orphan G protein-coupled receptors: the ongoing search for a home;Frontiers in Pharmacology;2024-02-29

2. Highly Accurate and Efficient Deep Learning Paradigm for Full-Atom Protein Loop Modeling with KarmaLoop;Research;2024-01