Abstract
Sequence-based residue contact prediction plays a crucial role in protein structure reconstruction. In recent years, the combination of evolutionary coupling analysis (ECA) and deep learning (DL) techniques has made tremendous progress for residue contact prediction, thus a comprehensive assessment of current methods based on a large-scale benchmark data set is very needed. In this study, we evaluate 18 contact predictors on 610 non-redundant proteins and 32 CASP13 targets according to a wide range of perspectives. The results show that different methods have different application scenarios: (1) DL methods based on multi-categories of inputs and large training sets are the best choices for low-contact-density proteins such as the intrinsically disordered ones and proteins with shallow multi-sequence alignments (MSAs). (2) With at least 5L (L is sequence length) effective sequences in the MSA, all the methods show the best performance, and methods that rely only on MSA as input can reach comparable achievements as methods that adopt multi-source inputs. (3) For top L/5 and L/2 predictions, DL methods can predict more hydrophobic interactions while ECA methods predict more salt bridges and disulfide bonds. (4) ECA methods can detect more secondary structure interactions, while DL methods can accurately excavate more contact patterns and prune isolated false positives. In general, multi-input DL methods with large training sets dominate current approaches with the best overall performance. Despite the great success of current DL methods must be stated the fact that there is still much room left for further improvement: (1) With shallow MSAs, the performance will be greatly affected. (2) Current methods show lower precisions for inter-domain compared with intra-domain contact predictions, as well as very high imbalances in precisions between intra-domains. (3) Strong prediction similarities between DL methods indicating more feature types and diversified models need to be developed. (4) The runtime of most methods can be further optimized.
Funder
National Key Research and Development Program of China
Strategic Priority CAS Project
National Science Foundation of China
Shenzhen Basic Research Fund
CAS Key Lab
Youth Innovation Promotion Associatio
the Outstanding Youth Innovation Fun
Publisher
Public Library of Science (PLoS)
Subject
Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modelling and Simulation,Ecology, Evolution, Behavior and Systematics
Reference72 articles.
1. Protein structure prediction from sequence variation;DS Marks;Nature biotechnology,2012
2. Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis;T Nugent;Proceedings of the National Academy of Sciences,2012
3. De novo structure prediction of globular proteins aided by sequence variation-derived contacts;T Kosciolek;PloS one,2014
4. PconsFold: improved contact predictions improve protein models;M Michel;Bioinformatics,2014
5. CONFOLD: residue-residue contact-guided ab initio protein folding;B Adhikari;Proteins: Structure, Function, and Bioinformatics,2015
Cited by
24 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献