Combining classifiers for improved classification of proteins from sequence or structure-Reference-Cited by-同舟云学术

Combining classifiers for improved classification of proteins from sequence or structure

Published:2008-09-22 Issue:1 Volume:9 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Melvin Iain,Weston Jason,Leslie Christina S,Noble William S

Abstract

Abstract Background Predicting a protein's structural or functional class from its amino acid sequence or structure is a fundamental problem in computational biology. Recently, there has been considerable interest in using discriminative learning algorithms, in particular support vector machines (SVMs), for classification of proteins. However, because sufficiently many positive examples are required to train such classifiers, all SVM-based methods are hampered by limited coverage. Results In this study, we develop a hybrid machine learning approach for classifying proteins, and we apply the method to the problem of assigning proteins to structural categories based on their sequences or their 3D structures. The method combines a full-coverage but lower accuracy nearest neighbor method with higher accuracy but reduced coverage multiclass SVMs to produce a full coverage classifier with overall improved accuracy. The hybrid approach is based on the simple idea of "punting" from one method to another using a learned threshold. Conclusion In cross-validated experiments on the SCOP hierarchy, the hybrid methods consistently outperform the individual component methods at all levels of coverage. Code and data sets are available at http://noble.gs.washington.edu/proj/sabretooth

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-9-389.pdf

Reference21 articles.

1. Shindyalov IN, Bourne PE: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Engineering 1998, 11: 739–747. 10.1093/protein/11.9.739

2. Holm L, Sander C: Protein Structure Comparison by Alignment of Distance Matrices. Journal of Molecular Biology 1993, 233: 123–138. 10.1006/jmbi.1993.1489

3. Ortiz AR, Strauss CEM, Olmea O: MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison. Protein Science 2002, 11: 2606–2621. 10.1110/ps.0215902

4. Smith T, Waterman M: Identification of common molecular subsequences. Journal of Molecular Biology 1981, 147: 195–197. 10.1016/0022-2836(81)90087-5

5. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: A basic local alignment search tool. Journal of Molecular Biology 1990, 215: 403–410.

Cited by 19 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Theory and algorithms for learning with rejection in binary classification;Annals of Mathematics and Artificial Intelligence;2023-12-13

2. Network-based protein structural classification;Royal Society Open Science;2020-06

3. Integrating radiologist feedback with computer aided diagnostic systems for breast cancer risk prediction in ultrasonic images: An experimental investigation in machine learning paradigm;Expert Systems with Applications;2017-12

4. A comprehensive review and comparison of different computational methods for protein remote homology detection;Briefings in Bioinformatics;2016-11-13

5. Nearest neighbor ensembles for functional data with interpretable feature selection;Chemometrics and Intelligent Laboratory Systems;2015-08