Identifying Pupylation Proteins and Sites by Incorporating Multiple Methods-Reference-Cited by-同舟云学术

Identifying Pupylation Proteins and Sites by Incorporating Multiple Methods

Published:2022-04-26 Issue: Volume:13 Page:
ISSN:1664-2392
Container-title:Frontiers in Endocrinology
language:
Short-container-title:Front. Endocrinol.

Author:

Qiu Wang-Ren,Guan Meng-Yue,Wang Qian-Kun,Lou Li-Liang,Xiao Xuan

Abstract

Pupylation is an important posttranslational modification in proteins and plays a key role in the cell function of microorganisms; an accurate prediction of pupylation proteins and specified sites is of great significance for the study of basic biological processes and development of related drugs since it would greatly save experimental costs and improve work efficiency. In this work, we first constructed a model for identifying pupylation proteins. To improve the pupylation protein prediction model, the KNN scoring matrix model based on functional domain GO annotation and the Word Embedding model were used to extract the features and Random Under-sampling (RUS) and Synthetic Minority Over-sampling Technique (SMOTE) were applied to balance the dataset. Finally, the balanced data sets were input into Extreme Gradient Boosting (XGBoost). The performance of 10-fold cross-validation shows that accuracy (ACC), Matthew’s correlation coefficient (MCC), and area under the ROC curve (AUC) are 95.23%, 0.8100, and 0.9864, respectively. For the pupylation site prediction model, six feature extraction codes (i.e., TPC, AAI, One-hot, PseAAC, CKSAAP, and Word Embedding) served to extract protein sequence features, and the chi-square test was employed for feature selection. Rigorous 10-fold cross-validations indicated that the accuracies are very high and outperformed its existing counterparts. Finally, for the convenience of researchers, PUP-PS-Fuse has been established at https://bioinfo.jcu.edu.cn/PUP-PS-Fuse and http://121.36.221.79/PUP-PS-Fuse/as a backup.

Funder

National Natural Science Foundation of China

Publisher

Frontiers Media SA

Subject

Endocrinology, Diabetes and Metabolism

Reference63 articles.

1. Recognition of Protein Pupylation Sites by Adopting Resampling Approach;Li;Molecules,2018

2. The Pupylation Pathway and Its Role in Mycobacteria;Barandun;BMC Biol,2012

3. Organismal Differences in Post-Translational Modifications in Histones H3 and H4;Garcia;J Biol Chem,2007

4. Ubiquitin and Ubiquitin-Like Proteins in Protein Regulation;Herrmann;Circ Res,2007

5. Ensemble Learning Method for the Prediction of New Bioactive Molecules;Afolabi;PloS One,2018

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Stacking-ac4C: an ensemble model using mixed features for identifying n4-acetylcytidine in mRNA;Frontiers in Immunology;2023-11-29

2. Prediction of Plant Ubiquitylation Proteins and Sites by Fusing Multiple Features;Current Bioinformatics;2023-11-20

3. The Mechanism and Biological Functions of Pup-proteasome System;PROG BIOCHEM BIOPHYS;2023