Affiliation:
1. School of Information Science and Engineering, Yunnan University, Kunming 650504, China
Abstract
Apoptosis proteins are strongly related to many diseases and play an indispensable role in maintaining the dynamic balance between cell death and division in vivo. Obtaining localization information on apoptosis proteins is necessary in understanding their function. To date, few researchers have focused on the problem of apoptosis data imbalance before classification, while this data imbalance is prone to misclassification. Therefore, in this work, we introduce a method to resolve this problem and to enhance prediction accuracy. Firstly, the features of the protein sequence are captured by combining Improving Pseudo-Position-Specific Scoring Matrix (IM-Psepssm) with the Bidirectional Correlation Coefficient (Bid-CC) algorithm from position-specific scoring matrix. Secondly, different features of fusion and resampling strategies are used to reduce the impact of imbalance on apoptosis protein datasets. Finally, the eigenvector adopts the Support Vector Machine (SVM) to the training classification model, and the prediction accuracy is evaluated by jackknife cross-validation tests. The experimental results indicate that, under the same feature vector, adopting resampling methods remarkably boosts many significant indicators in the unsampling method for predicting the localization of apoptosis proteins in the ZD98, ZW225, and CL317 databases. Additionally, we also present new user-friendly local software for readers to apply; the codes and software can be freely accessed at https://github.com/ruanxiaoli/Im-Psepssm.
Funder
National Natural Science Foundation of China
Subject
General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献