Affiliation:
1. College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
2. First Affiliated Hospital, Gannan Medical University, Ganzhou, China
Abstract
Background::
Protein lysine crotonylation (Kcr), a newly discovered important post-translational modification (PTM), is typically localized at the transcription start site and regulates gene expression, which is associated with a variety of pathological conditions such as developmen-tal defects and malignant transformation.
Objective::
Identifying Kcr sites is advantageous for the discovery of its biological mechanism and the development of new drugs for related diseases. However, traditional experimental methods for identifying Kcr sites are expensive and inefficient, necessitating the development of new computa-tional techniques.
Methods::
In this work, to accurately identify Kcr sites, we propose a model for ensemble learning called Stacking-Kcr. Firstly, extract features from sequence information, physicochemical proper-ties, and sequence fragment similarity. Then, the two characteristics of sequence information and physicochemical properties are fused using automatic encoder and serial, respectively. Finally, the fused two features and sequence fragment similarity features are then respectively input into the four base classifiers, a meta classifier is constructed using the first level prediction results, and the final forecasting results are obtained.
method:
In this work, to accurately identify Kcr sites, we propose a model for ensemble learning called Stacking-Kcr. Firstly, extract features from sequence information, physicochemical properties, and sequence fragment similarity. Then, the two characteristics of sequence information and physicochemical properties are fused using automatic encoder and serial, respectively. Finally, the fused two features and sequence fragment similarity features are then respectively input into the four base classifiers, a meta classifier is constructed using the first level prediction results, and the final forecasting results are obtained.
Results::
The five-fold cross-validation of this model has achieved an accuracy of 0.828 and an AUC of 0.910. This shows that the Stacking-Kcr method has obvious advantages over traditional machine learning methods. On independent test sets, Stacking-Kcr achieved an accuracy of 84.89% and an AUC of 92.21%, which was higher than 1.7% and 0.8% of other state-of-the-art tools. Addi-tionally, we trained Stacking-Kcr on the phosphorylation site, and the result is superior to the cur-rent model.
Conclusion::
These outcomes are additional evidence that Stacking-Kcr has strong application po-tential and generalization performance.
Publisher
Bentham Science Publishers Ltd.
Subject
Computational Mathematics,Genetics,Molecular Biology,Biochemistry