Affiliation:
1. School of Science, Dalian Maritime University, Dalian 116026, P. R. China
2. School of Mathematics and Statistics, Lingnan Normal University, Zhanjiang 524048, P. R. China
Abstract
O-glycosylation is a protein posttranslational modification important in regulating almost all cells. It is related to a large number of physiological and pathological phenomena. Recognizing O-glycosylation sites is the key to further investigating the molecular mechanism of protein posttranslational modification. This study aimed to collect a reliable dataset on Homo sapiens and develop an O-glycosylation predictor for Homo sapiens, named Captor, through multiple features. A random undersampling method and a synthetic minority oversampling technique were employed to deal with imbalanced data. In addition, the Kruskal–Wallis (K–W) test was adopted to optimize feature vectors and improve the performance of the model. A support vector machine, due to its optimal performance, was used to train and optimize the final prediction model after a comprehensive comparison of various classifiers in traditional machine learning methods and deep learning. On the independent test set, Captor outperformed the existing O-glycosylation tool, suggesting that Captor could provide more instructive guidance for further experimental research on O-glycosylation. The source code and datasets are available at https://github.com/YanZhu06/Captor/ .
Funder
Innovative Research Group Project of the National Natural Science Foundation of China
Publisher
World Scientific Pub Co Pte Ltd
Subject
Computer Science Applications,Molecular Biology,Biochemistry
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献