An Extended Feature Representation Technique for Predicting Sequenced-based Host-pathogen Protein-protein Interaction-Reference-Cited by-同舟云学术

An Extended Feature Representation Technique for Predicting Sequenced-based Host-pathogen Protein-protein Interaction

Published:2025-01 Issue:1 Volume:20 Page:1-17
ISSN:1574-8936
Container-title:Current Bioinformatics
language:en
Short-container-title:CBIO

Author:

Emmanuel Jerry¹²^ORCID,Isewon Itunuoluwa¹³²,Olasehinde Grace³⁴,Oyelade Jelili¹³²

Affiliation:

1. Department of Computer & Information Sciences, Covenant University, Ota, 112104, Nigeria

2. Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, 112104, Nigeria

3. Covenant Applied Informatics and Communication African Centre of Excellence (CApIC-ACE), Covenant University, Ota, 112104, Nigeria

4. Department of Biological Science, Covenant University, Ota, 112104, Nigeria

Abstract

Background: The use of machine learning models in sequence-based Protein-Protein Interaction prediction typically requires the conversion of amino acid sequences into feature vectors. From the literature, two approaches have been used to achieve this transformation. These are referred to as the Independent Protein Feature (IPF) and Merged Protein Feature (MPF) extraction methods. As observed, studies have predominantly adopted the IPF approach, while others preferred the MPF method, in which host and pathogen sequences are concatenated before feature encoding. Objective: This presents the challenge of determining which approach should be adopted for improved HPPPI prediction. Therefore, this work introduces the Extended Protein Feature (EPF) method. Methods: The proposed method combines the predictive capabilities of IPF and MPF, extracting essential features, handling multicollinearity, and removing features with zero importance. EPF, IPF, and MPF were tested using bacteria, parasite, virus, and plant HPPPI datasets and were deployed to machine learning models, including Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP), Naïve Bayes (NB), Logistic Regression (LR), and Deep Forest (DF). Results: The results indicated that MPF exhibited the lowest performance overall, whereas IPF performed better with decision tree-based models, such as RF and DF. In contrast, EPF demonstrated improved performance with SVM, LR, NB, and MLP and also yielded competitive results with DF and RF. Conclusion: In conclusion, the EPF approach developed in this study exhibits substantial improvements in four out of the six models evaluated. This suggests that EPF offers competitiveness with IPF and is particularly well-suited for traditional machine learning models.

Publisher

Bentham Science Publishers Ltd.