Author:
Volpato Viola,Adelfio Alessandro,Pollastri Gianluca
Abstract
Abstract
We present a novel ab initio predictor of protein enzymatic class. The predictor can classify proteins, solely based on their sequences, into one of six classes extracted from the enzyme commission (EC) classification scheme and is trained on a large, curated database of over 6,000 non-redundant proteins which we have assembled in this work. The predictor is powered by an ensemble of N-to-1 Neural Network, a novel architecture which we have recently developed. N-to-1 Neural Networks operate on the full sequence and not on predefined features. All motifs of a predefined length (31 residues in this work) are considered and are compressed by an N-to-1 Neural Network into a feature vector which is automatically determined during training. We test our predictor in 10-fold cross-validation and obtain state of the art results, with a 96% correct classification and 86% generalized correlation. All six classes are predicted with a specificity of at least 80% and false positive rates never exceeding 7%. We are currently investigating enhanced input encoding schemes which include structural information, and are analyzing trained networks to mine motifs that are most informative for the prediction, hence, likely, functionally relevant.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference28 articles.
1. Whisstock JC, Lesk AM: Prediction of protein function from protein sequence and structure. Quarterly Reviews of Biophysics. 2003, 36: 307-340. 10.1017/S0033583503003901.
2. Murzin AG: How far divergent evolution goes in proteins. Current Opinion in Structural Biology. 1998, 8: 380-387. 10.1016/S0959-440X(98)80073-0.
3. Grishin N: Fold change in evolution of protein structures. Journal of Structural Biology. 2001, 134: 167-185. 10.1006/jsbi.2001.4335.
4. Copley RR, Bork P: Homology among (beta-alpha)(8) barrels: implications for the evolution of metabolic pathways. Journal of Molecular Biology. 2000, 303: 627-641. 10.1006/jmbi.2000.4152.
5. Pandey G, Kumar V, Steinbach M: Computational approaches for protein function prediction. Tech Rep TR 06-028. 2006, Department of Computer Science and Engineering, University of Minnesota, Twin Cities
Cited by
41 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献