Affiliation:
1. School of Computer and Information Engineering, Henan Normal University, Henan, China
Abstract
Background:
Thermophilic proteins can maintain good activity under high temperature,
therefore, it is important to study thermophilic proteins for the thermal stability of proteins.
Objective:
In order to solve the problem of low precision and low efficiency in predicting
thermophilic proteins, a prediction method based on feature fusion and machine learning was
proposed in this paper.
Methods:
For the selected thermophilic data sets, firstly, the thermophilic protein sequence was
characterized based on feature fusion by the combination of g-gap dipeptide, entropy density and
autocorrelation coefficient. Then, Kernel Principal Component Analysis (KPCA) was used to reduce
the dimension of the expressed protein sequence features in order to reduce the training time and
improve efficiency. Finally, the classification model was designed by using the classification
algorithm.
Results:
A variety of classification algorithms was used to train and test on the selected thermophilic
dataset. By comparison, the accuracy of the Support Vector Machine (SVM) under the jackknife
method was over 92%. The combination of other evaluation indicators also proved that the SVM
performance was the best.
Conclusion:
Because of choosing an effectively feature representation method and a robust
classifier, the proposed method is suitable for predicting thermophilic proteins and is superior to
most reported methods.
Funder
Science and Technology Research Key Project of Educational Department of Henan Province
Production and Learning Cooperation and Cooperative Education Project of Ministry of Education of China
Natural Science Foundation of Henan province
Project of Science and Technology Department of Henan Province of China
National Natural Science Foundation of China
Publisher
Bentham Science Publishers Ltd.
Subject
Computational Mathematics,Genetics,Molecular Biology,Biochemistry
Cited by
115 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献