Author:
Deng Lei,Guan Jihong,Dong Qiwen,Zhou Shuigeng
Abstract
Abstract
Background
Prediction of protein-protein interaction sites is one of the most challenging and intriguing problems in the field of computational biology. Although much progress has been achieved by using various machine learning methods and a variety of available features, the problem is still far from being solved.
Results
In this paper, an ensemble method is proposed, which combines bootstrap resampling technique, SVM-based fusion classifiers and weighted voting strategy, to overcome the imbalanced problem and effectively utilize a wide variety of features. We evaluate the ensemble classifier using a dataset extracted from 99 polypeptide chains with 10-fold cross validation, and get a AUC score of 0.86, with a sensitivity of 0.76 and a specificity of 0.78, which are better than that of the existing methods. To improve the usefulness of the proposed method, two special ensemble classifiers are designed to handle the cases of missing homologues and structural information respectively, and the performance is still encouraging. The robustness of the ensemble method is also evaluated by effectively classifying interaction sites from surface residues as well as from all residues in proteins. Moreover, we demonstrate the applicability of the proposed method to identify interaction sites from the non-structural proteins (NS) of the influenza A virus, which may be utilized as potential drug target sites.
Conclusion
Our experimental results show that the ensemble classifiers are quite effective in predicting protein interaction sites. The Sub-EnClassifiers with resampling technique can alleviate the imbalanced problem and the combination of Sub-EnClassifiers with a wide variety of feature groups can significantly improve prediction performance.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference62 articles.
1. Alberts BD, Lewis J, Raff M, Roberts K, Watson JD: Molecular Biology of the Cell. New York: Garland; 1989.
2. Chothia C, Janin J: Principles of protein-protein recognition. Nature 1975, 256: 705–708. 10.1038/256705a0
3. Argos P: An investigation of protein subunit and domain interfaces. Protein Eng 1988, 2: 101–113. 10.1093/protein/2.2.101
4. Janin J, Miller S, Chothia C: Surface, subunit interfaces and interior of oligomeric proteins. J Mol Biol 1988, 204: 155–164. 10.1016/0022-2836(88)90606-7
5. Janin J, Chothia C: The structure of protein-protein recognition sites. J Biol Chem 1990, 265: 16027–16030.
Cited by
69 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献