Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure-Reference-Cited by-同舟云学术

Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure

Published:2011-05-13 Issue:1 Volume:12 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Aydin Zafer,Singh Ajit,Bilmes Jeff,Noble William S

Abstract

Abstract Background Protein secondary structure prediction provides insight into protein function and is a valuable preliminary step for predicting the 3D structure of a protein. Dynamic Bayesian networks (DBNs) and support vector machines (SVMs) have been shown to provide state-of-the-art performance in secondary structure prediction. As the size of the protein database grows, it becomes feasible to use a richer model in an effort to capture subtle correlations among the amino acids and the predicted labels. In this context, it is beneficial to derive sparse models that discourage over-fitting and provide biological insight. Results In this paper, we first show that we are able to obtain accurate secondary structure predictions. Our per-residue accuracy on a well established and difficult benchmark (CB513) is 80.3%, which is comparable to the state-of-the-art evaluated on this dataset. We then introduce an algorithm for sparsifying the parameters of a DBN. Using this algorithm, we can automatically remove up to 70-95% of the parameters of a DBN while maintaining the same level of predictive accuracy on the SD576 set. At 90% sparsity, we are able to compute predictions three times faster than a fully dense model evaluated on the SD576 set. We also demonstrate, using simulated data, that the algorithm is able to recover true sparse structures with high accuracy, and using real data, that the sparse model identifies known correlation structure (local and non-local) related to different classes of secondary structure elements. Conclusions We present a secondary structure prediction method that employs dynamic Bayesian networks and support vector machines. We also introduce an algorithm for sparsifying the parameters of the dynamic Bayesian network. The sparsification approach yields a significant speed-up in generating predictions, and we demonstrate that the amino acid correlations identified by the algorithm correspond to several known features of protein secondary structure. Datasets and source code used in this study are available at http://noble.gs.washington.edu/proj/pssp.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-12-154.pdf

Reference58 articles.

1. Qian N, Sejnowski TJ: Predicting the secondary structure of globular proteins using neural network models. Journal of Molecular Biology 1988, 202(4):865–884. 10.1016/0022-2836(88)90564-5

2. Zvelebil MJ, Barton GJ, Taylor WR, Sternberg MJ: Prediction of protein secondary structure and active sites using the alignment of homologous sequences. Journal of Molecular Biology 1987, 195: 957–961. 10.1016/0022-2836(87)90501-8

3. Asai K, Hayamizu S, Handa KI: Prediction of protein secondary structure by the hidden Markov model. Comp Applic Biosci 1993, 9(2):141–146.

4. Carugo O, Eisenhaber F: Data Mining Techniques for the Life Sciences, New York: Humana Press and Springer Bussiness Media, Volume 609 of Methods in Molecular Biology. 2010, chap 19: 327–348.

5. Yao XQ, Zhu H, She ZS: A dynamic Bayesian network approach to protein secondary structure prediction. BMC Bioinformatics 2008., 9(49):

Cited by 31 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. PROTEIN STRUCTURE PREDICTION: AN IN-DEPTH COMPARISON OF APPROACHES AND TOOLS;Eskişehir Teknik Üniversitesi Bilim ve Teknoloji Dergisi - C Yaşam Bilimleri Ve Biyoteknoloji;2024-01-30

2. IGPRED-MultiTask: A Deep Learning Model to Predict Protein Secondary Structure, Torsion Angles and Solvent Accessibility;IEEE/ACM Transactions on Computational Biology and Bioinformatics;2023-03-01

3. SuccSPred2.0: A Two-Step Model to Predict Succinylation Sites Based on Multifeature Fusion and Selection Algorithm;Journal of Computational Biology;2022-10-01

4. Prediction of protein structural class based on symmetrical recurrence quantification analysis;Computational Biology and Chemistry;2021-06

5. IGPRED : Combination of convolutional neural and graph convolutional networks for protein secondary structure prediction;Proteins: Structure, Function, and Bioinformatics;2021-05-25