Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization-Reference-Cited by-同舟云学术

Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization

Published:2020-07-01 Issue:Supplement_1 Volume:36 Page:i317-i325
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Krieger Spencer¹,Kececioglu John¹

Affiliation:

1. Department of Computer Science, The University of Arizona, Tucson, AZ 85721, USA

Abstract

Abstract Motivation Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additional information in a so-called template-based method has the potential to significantly boost prediction accuracy. Method We present a new hybrid approach to secondary structure prediction that gains the advantages of both template- and non-template-based methods. Our core template-based method is an algorithmic approach that uses metric-space nearest neighbor search over a template database of fixed-length amino acid words to determine estimated class-membership probabilities for each residue in the protein. These probabilities are then input to a dynamic programming algorithm that finds a physically valid maximum-likelihood prediction for the entire protein. Our hybrid approach exploits a novel accuracy estimator for our core method, which estimates the unknown true accuracy of its prediction, to discern when to switch between template- and non-template-based methods. Results On challenging CASP benchmarks, the resulting hybrid approach boosts the state-of-the-art Q8 accuracy by more than 2–10%, and Q3 accuracy by more than 1–3%, yielding the most accurate method currently available for both 3- and 8-state secondary structure prediction. Availability and implementation A preliminary implementation in a new tool we call Nnessy is available free for non-commercial use at http://nnessy.cs.arizona.edu.

Funder

National Science Foundation

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/article-pdf/36/Supplement_1/i317/33488847/btaa336.pdf

Reference28 articles.

1. Accurate prediction of solvent accessibility using neural networks-based regression;Adamczak;Proteins,2004

2. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs;Altschul;Nucleic Acids Res,1997

3. The Protein Data Bank;Berman;Nucleic Acids Res,2000

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine learning for predicting protein properties: A comprehensive review;Neurocomputing;2024-09

2. GBDT_KgluSite: An improved computational prediction model for lysine glutarylation sites based on feature fusion and GBDT classifier;BMC Genomics;2023-12-11

3. Method to Generate Complex Predictive Features for Machine Learning-Based Prediction of the Local Structure and Functions of Proteins;Molecular Biology;2023-02

4. WG-ICRN: Protein 8-state secondary structure prediction based on Wasserstein generative adversarial networks and residual networks with Inception modules;Mathematical Biosciences and Engineering;2023

5. Protein 8-State Secondary Structure Prediction Based on Wasserstein Generative Adversarial Network and Residual Network;Hans Journal of Computational Biology;2023