Affiliation:
1. College of Information Engineering, Shanghai Maritime University, Shanghai, China
2. Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
3. School of Engineering, University of Nantes, Nantes, France
Abstract
Background:
Nitration is an important Post-Translational Modification (PTM) occurring
on the tyrosine residues of proteins. The occurrence of protein tyrosine nitration under disease
conditions is inevitable and represents a shift from the signal transducing physiological actions of -
NO to oxidative and potentially pathogenic pathways. Abnormal protein nitration modification can
lead to serious human diseases, including neurodegenerative diseases, acute respiratory distress, organ
transplant rejection and lung cancer.
Objective:
It is necessary and important to identify the nitration sites in protein sequences. Predicting
which tyrosine residues in the protein sequence are nitrated and which are not is of great significance
for the study of nitration mechanism and related diseases.
Methods:
In this study, a prediction model of nitration sites based on the over-under sampling strategy
and the FCBF method was proposed by stacking ensemble learning and fusing multiple features.
Firstly, the protein sequence sample was encoded by 2701-dimensional fusion features
(PseAAC, PSSM, AAIndex, CKSAAP, Disorder). Secondly, the ranked feature set was generated
by the FCBF method according to the symmetric uncertainty metric. Thirdly, in the process of model
training, the over- and under- sampling technique was used to tackle the imbalanced dataset. Finally,
the Incremental Feature Selection (IFS) method was adopted to extract an optimal classifier
based on 10-fold cross-validation.
Results and Conclusion:
Results show that the model has significant performance advantages in indicators
such as MCC, Recall and F1-score, no matter in what way the comparison was conducted
with other classifiers on the independent test set, or made by cross-validation with single-type feature
or with fusion-features on the training set. By integrating the FCBF feature ranking methods,
over- and under- sampling technique and a stacking model composed of multiple base classifiers,
an effective prediction model for nitration PTM sites was built, which can achieve a better recall
rate when the ratio of positive and negative samples is highly imbalanced.
Funder
National Natural Science Foundation of China
Publisher
Bentham Science Publishers Ltd.
Subject
Molecular Biology,Biochemistry
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献