Calibrating the classifier for protein family prediction with protein sequence using machine learning techniques: An empirical investigation-Reference-Cited by-同舟云学术

Calibrating the classifier for protein family prediction with protein sequence using machine learning techniques: An empirical investigation

Published:2023-01-25 Issue:03 Volume:21 Page:
ISSN:0219-6913
Container-title:International Journal of Wavelets, Multiresolution and Information Processing
language:en
Short-container-title:Int. J. Wavelets Multiresolut Inf. Process.

Author:

Idhaya T.¹^ORCID,Suruliandi A.¹,Calitoiu Dragos²,Raja S. P.³

Affiliation:

1. Department of Computer Science and Engineering, Manonmaniam Sundaranar University, Abhishekapatti, Tirunelveli, Tamil Nadu, India

2. School of Mathematics and Statistics, Carleton University, Canada

3. School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India

Abstract

A gene is a basic unit of congenital traits and a sequence of nucleotides in deoxyribonucleic acid that encrypts protein synthesis. Proteins are made up of amino acid residue and are classified for use in protein-related research, which includes identifying changes in genes, finding associations with diseases and phenotypes, and identifying potential drug targets. To this end, proteins are studied and classified, based on the family. For family prediction, however, a computational rather than an experimental approach is introduced, owing to the time involved in the latter process. Computational approaches to protein family prediction involve two important processes, feature selection and classification. Existing approaches to protein family prediction are alignment-based and alignment-free. The drawback of the former is that it searches for protein signatures by aligning every available sequence. Consequently, the latter alignment-free approach is taken for study, given that it only needs sequence-based features to predict the protein family and is far more efficient than the former. Nevertheless, the sequence-based characteristics taken for study have additional features to offer. There is, thus, a need to select the best features of all. When comes to classification still there is no perfection in classifying the protein. So, a comparison of different approaches is done to find the best feature selection technique and classification technique for protein family prediction. From the study, the feature subset selected provides the best classification accuracy of 96% for filter-based feature selection technique and the random forest classifier.

Publisher

World Scientific Pub Co Pte Ltd

Subject

Applied Mathematics,Information Systems,Signal Processing

Link

https://www.worldscientific.com/doi/pdf/10.1142/S021969132250045X

Reference55 articles.

1. Fast index based algorithms and software for matching position specific scoring matrices

2. Classification of Nuclear Receptors Based on Amino Acid Composition and Dipeptide Composition

3. Positional flexibilities of amino acid residues in globular proteins

4. On the average hydrophobicity of proteins and the relation between it and protein structure

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Targeted Metabolomics Study on the Effect of Vinegar Processing on the Chemical Changes and Antioxidant Activity of Angelica sinensis;Antioxidants;2023-11-28