Affiliation:
1. Department of Computer & Information Sciences, Covenant University, Ota, 112104, Nigeria
2. Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, 112104, Nigeria
3. Covenant Applied Informatics
and Communication African Centre of Excellence (CApIC-ACE), Covenant University, Ota, 112104, Nigeria
4. Department of
Biological Science, Covenant University, Ota, 112104, Nigeria
Abstract
Background:
The use of machine learning models in sequence-based Protein-Protein Interaction
prediction typically requires the conversion of amino acid sequences into feature vectors.
From the literature, two approaches have been used to achieve this transformation. These are referred
to as the Independent Protein Feature (IPF) and Merged Protein Feature (MPF) extraction
methods. As observed, studies have predominantly adopted the IPF approach, while others preferred
the MPF method, in which host and pathogen sequences are concatenated before feature encoding.
Objective:
This presents the challenge of determining which approach should be adopted for improved
HPPPI prediction. Therefore, this work introduces the Extended Protein Feature (EPF)
method.
Methods:
The proposed method combines the predictive capabilities of IPF and MPF, extracting essential
features, handling multicollinearity, and removing features with zero importance. EPF, IPF,
and MPF were tested using bacteria, parasite, virus, and plant HPPPI datasets and were deployed to
machine learning models, including Random Forest (RF), Support Vector Machine (SVM), Multilayer
Perceptron (MLP), Naïve Bayes (NB), Logistic Regression (LR), and Deep Forest (DF).
Results:
The results indicated that MPF exhibited the lowest performance overall, whereas IPF performed
better with decision tree-based models, such as RF and DF. In contrast, EPF demonstrated
improved performance with SVM, LR, NB, and MLP and also yielded competitive results with DF
and RF.
Conclusion:
In conclusion, the EPF approach developed in this study exhibits substantial improvements
in four out of the six models evaluated. This suggests that EPF offers competitiveness with
IPF and is particularly well-suited for traditional machine learning models.
Publisher
Bentham Science Publishers Ltd.