Affiliation:
1. Bioinformatics Laboratory, Department of Statistics, Rajshahi University, Rajshahi-6205, Bangladesh
2. Department of Mathematics, Jashore University of Science and Technology, Jashore, Bangladesh
3. Deptartment of Bioscience and Bioinformatics, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, Japan
Abstract
Background:
Protein-Protein Interaction (PPI) has emerged as a key role in the control
of many biological processes including protein function, disease incidence, and therapy design.
However, the identification of PPI by wet lab experiment is a challenging task, since it is laborious,
time consuming and expensive. Therefore, computational prediction of PPI is now given emphasis
before going to the experimental validation, since it is simultaneously less laborious, time saver
and cost minimizer.
Objective:
The objective of this study is to develop an improved computational method for PPI prediction
mapping on Homo sapiens by using the amino acid sequence features in a supervised learning
framework.
Methods:
The experimentally validated 91 positive-PPI pairs of human protein sequences were collected
from IntAct Molecular Interaction Database. Then we constructed three balanced datasets
with ratios 1:1, 1:2 and 1:3 of positive and negative PPI samples. Then we partitioned each dataset
into training (80%) and independent test (20%) datasets. Again each training dataset was partitioned
into four mutually exclusive groups of equal sizes for interchanging each group with independent
test group to perform 5-fold cross validation (CV). Then we trained candidate seven classifiers
(NN, SVM, LR, NB, KNN, AB and RF) with each ratio case to obtain the better PPI predictor
by comparing their performance scores.
Results:
The random forest (RF) based predictor that was trained with 1:2 ratio of positive-PPI and
negative-PPI samples based on AAC encoding features provided the most accurate PPI prediction
by producing the highest average performance scores of accuracy (93.50%), sensitivity (95.0%),
MCC (85.2%), AUC (0.941) and pAUC (0.236) with the 5-fold cross-validation. It also achieved
the highest average performance scores of accuracy (92.0%), sensitivity (94.0%), MCC (83.6%),
AUC (0.922) and pAUC (0.207) with the independent test datasets in a comparison of the other candidate
and existing predictors.
Conclusion:
The final resultant prediction strongly recommend that the RF based predictor is a better
prediction model of PPI mapping on Homo sapiens.
Publisher
Bentham Science Publishers Ltd.
Subject
Biochemistry,General Medicine,Structural Biology
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献