Author:
Aydin Zafer,Singh Ajit,Bilmes Jeff,Noble William S
Abstract
Abstract
Background
Protein secondary structure prediction provides insight into protein function and is a valuable preliminary step for predicting the 3D structure of a protein. Dynamic Bayesian networks (DBNs) and support vector machines (SVMs) have been shown to provide state-of-the-art performance in secondary structure prediction. As the size of the protein database grows, it becomes feasible to use a richer model in an effort to capture subtle correlations among the amino acids and the predicted labels. In this context, it is beneficial to derive sparse models that discourage over-fitting and provide biological insight.
Results
In this paper, we first show that we are able to obtain accurate secondary structure predictions. Our per-residue accuracy on a well established and difficult benchmark (CB513) is 80.3%, which is comparable to the state-of-the-art evaluated on this dataset. We then introduce an algorithm for sparsifying the parameters of a DBN. Using this algorithm, we can automatically remove up to 70-95% of the parameters of a DBN while maintaining the same level of predictive accuracy on the SD576 set. At 90% sparsity, we are able to compute predictions three times faster than a fully dense model evaluated on the SD576 set. We also demonstrate, using simulated data, that the algorithm is able to recover true sparse structures with high accuracy, and using real data, that the sparse model identifies known correlation structure (local and non-local) related to different classes of secondary structure elements.
Conclusions
We present a secondary structure prediction method that employs dynamic Bayesian networks and support vector machines. We also introduce an algorithm for sparsifying the parameters of the dynamic Bayesian network. The sparsification approach yields a significant speed-up in generating predictions, and we demonstrate that the amino acid correlations identified by the algorithm correspond to several known features of protein secondary structure. Datasets and source code used in this study are available at http://noble.gs.washington.edu/proj/pssp.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference58 articles.
1. Qian N, Sejnowski TJ: Predicting the secondary structure of globular proteins using neural network models. Journal of Molecular Biology 1988, 202(4):865–884. 10.1016/0022-2836(88)90564-5
2. Zvelebil MJ, Barton GJ, Taylor WR, Sternberg MJ: Prediction of protein secondary structure and active sites using the alignment of homologous sequences. Journal of Molecular Biology 1987, 195: 957–961. 10.1016/0022-2836(87)90501-8
3. Asai K, Hayamizu S, Handa KI: Prediction of protein secondary structure by the hidden Markov model. Comp Applic Biosci 1993, 9(2):141–146.
4. Carugo O, Eisenhaber F: Data Mining Techniques for the Life Sciences, New York: Humana Press and Springer Bussiness Media, Volume 609 of Methods in Molecular Biology. 2010, chap 19: 327–348.
5. Yao XQ, Zhu H, She ZS: A dynamic Bayesian network approach to protein secondary structure prediction. BMC Bioinformatics 2008., 9(49):
Cited by
31 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献