Affiliation:
1. School of Computer and Communication, Hunan Institute of Engineering, Xiangtan 411104, Hunan, P. R. China
2. Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, Hunan, P. R. China
Abstract
O-glycosylation (Oglyc) plays an important role in various biological processes. The key to understanding the mechanisms of Oglyc is identifying the corresponding glycosylation sites. Two critical steps, feature selection and classifier design, greatly affect the accuracy of computational methods for predicting Oglyc sites. Based on an efficient feature selection algorithm and a classifier capable of handling imbalanced datasets, a new computational method, ChiMIC-based balanced decision table O-glycosylation (CBDT-Oglyc), is proposed. ChiMIC-based balanced decision table for O-glycosylation (CBDT-Oglyc), is proposed to predict Oglyc sites in proteins. Sequence characterization is performed by combining amino acid composition (AAC), undirected composition of [Formula: see text]-spaced amino acid pairs (undirected-CKSAAP) and pseudo-position-specific scoring matrix (PsePSSM). Chi-MIC-share algorithm is used for feature selection, which simplifies the model and improves predictive accuracy. For imbalanced classification, a backtracking method based on local chi-square test is designed, and then cost-sensitive learning is incorporated to construct a novel classifier named ChiMIC-based balanced decision table (CBDT). Based on a 1:49 (positives:negatives) training set, the CBDT classifier achieves significantly better prediction performance than traditional classifiers. Moreover, the independent test results on separate human and mouse glycoproteins show that CBDT-Oglyc outperforms previous methods in global accuracy. CBDT-Oglyc shows great promise in predicting Oglyc sites and is expected to facilitate further experimental studies on protein glycosylation.
Funder
Scientific Research Foundation of Hunan Provincial Education Department
Hunan Institute of Engineering
Publisher
World Scientific Pub Co Pte Ltd
Subject
Computer Science Applications,Molecular Biology,Biochemistry