DECISION TREE BASED INFORMATION INTEGRATION FOR AUTOMATED PROTEIN CLASSIFICATION-Reference-Cited by-同舟云学术

DECISION TREE BASED INFORMATION INTEGRATION FOR AUTOMATED PROTEIN CLASSIFICATION

Published:2005-06 Issue:03 Volume:03 Page:717-742
ISSN:0219-7200
Container-title:Journal of Bioinformatics and Computational Biology
language:en
Short-container-title:J. Bioinform. Comput. Biol.

Author:

ÇAMOĞLU ORHAN¹,CAN TOLGA¹,SINGH AMBUJ K.¹,WANG YUAN-FANG¹

Affiliation:

1. Department of Computer Science, University of California at Santa Barbara, Santa Barbara, CA 93106, USA

Abstract

We propose a novel technique for automatically generating the SCOP classification of a protein structure with high accuracy. We achieve accurate classification by combining the decisions of multiple methods using the consensus of a committee (or an ensemble) classifier. Our technique, based on decision trees, is rooted in machine learning which shows that by judicially employing component classifiers, an ensemble classifier can be constructed to outperform its components. We use two sequence- and three structure-comparison tools as component classifiers. Given a protein structure and using the joint hypothesis, we first determine if the protein belongs to an existing category (family, superfamily, fold) in the SCOP hierarchy. For the proteins that are predicted as members of the existing categories, we compute their family-, superfamily-, andfold-level classifications using the consensus classifier. We show that we can significantly improve the classification accuracy compared to the individual component classifiers. In particular, we achieve error rates that are 3–12 times less than the individual classifiers' error rates at the family level, 1.5–4.5 times less at the superfamily level, and 1.1–2.4 times less at the fold level.

Publisher

World Scientific Pub Co Pte Lt

Subject

Computer Science Applications,Molecular Biology,Biochemistry

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0219720005001259

Reference21 articles.

1. Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases

2. The Pfam protein families database

3. Profile hidden Markov models

4. Automated assignment of SCOP and CATH protein structure classifications from FSSP scores

Cited by 19 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Intelligent Design of Antithrombotic Peptide Targeting Collagen;Langmuir;2024-04-26

2. Machine Learning to Predict Enzyme–Substrate Interactions in Elucidation of Synthesis Pathways: A Review;Metabolites;2024-03-07

3. EnZymClass: Substrate specificity prediction tool of plant acyl-ACP thioesterases based on ensemble learning;Current Research in Biotechnology;2022

4. EnZymClass: Substrate specificity prediction tool of plant acyl-ACP thioesterases based on Ensemble Learning;2021-07-06

5. Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions;Computational and Structural Biotechnology Journal;2017