Abstract
PurposeAutomatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP classification, one that performs competitively across several DNA-BP classification tasks.Design/methodology/approachEfficient DNA-BP classifier systems require the discovery of powerful protein representations and feature extraction methods. Experiments were performed that combined and compared descriptors extracted from state-of-the-art matrix/image protein representations. These descriptors were trained on separate support vector machines (SVMs) and evaluated. Convolutional neural networks with different parameter settings were fine-tuned on two matrix representations of proteins. Decisions were fused with the SVMs using the weighted sum rule and evaluated to experimentally derive the most powerful general-purpose DNA-BP classifier system.FindingsThe best ensemble proposed here produced comparable, if not superior, classification results on a broad and fair comparison with the literature across four different datasets representing a variety of DNA-BP classification tasks, thereby demonstrating both the power and generalizability of the proposed system.Originality/valueMost DNA-BP methods proposed in the literature are only validated on one (rarely two) datasets/tasks. In this work, the authors report the performance of our general-purpose DNA-BP system on four datasets representing different DNA-BP classification tasks. The excellent results of the proposed best classifier system demonstrate the power of the proposed approach. These results can now be used for baseline comparisons by other researchers in the field.
Subject
Computer Science Applications,Information Systems,Software
Reference81 articles.
1. An overview of the structures of protein–DNA complexes;Genome Biol,2000
2. The Universal Protein Resource (UniProt): an expanding universe of protein information;Nucleic Acids Res,2006
3. Xiong Y, Zhu X, Dai H, Wei DQ. Survey of computational approaches for prediction of dna-binding residues on protein surfaces. In: Huang T. (ed). Computational systems Biology: methods in molecular Biology, 1754. New York, NY: Humana Press; 2018.
4. Protein modeling: what happened to the “protein structure gap”?;Structure,2013
5. Prediction of protein cellular attributes using pseudo-amino acid composition;Proteins: Struct Func Genet,2001
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献