Drug-Protein Interactions Prediction Models Using Feature Selection and Classification Techniques-Reference-Cited by-同舟云学术

Drug-Protein Interactions Prediction Models Using Feature Selection and Classification Techniques

Published:2023-12 Issue:12 Volume:24 Page:817-834
ISSN:1389-2002
Container-title:Current Drug Metabolism
language:en
Short-container-title:CDM

Author:

Idhaya T.¹,Suruliandi A.¹,Raja S. P.²

Affiliation:

1. Department of Computer Science and Engineering, Manonmaniam Sundaranar University, Tirunelveli, Tamilnadu, India

2. School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamilnadu, India

Abstract

Background:: Drug-Protein Interaction (DPI) identification is crucial in drug discovery. The high dimensionality of drug and protein features poses challenges for accurate interaction prediction, necessitating the use of computational techniques. Docking-based methods rely on 3D structures, while ligand-based methods have limitations such as reliance on known ligands and neglecting protein structure. Therefore, the preferred approach is the chemogenomics-based approach using machine learning, which considers both drug and protein characteristics for DPI prediction. Methods:: In machine learning, feature selection plays a vital role in improving model performance, reducing overfitting, enhancing interpretability, and making the learning process more efficient. It helps extract meaningful patterns from drug and protein data while eliminating irrelevant or redundant information, resulting in more effective machine-learning models. On the other hand, classification is of great importance as it enables pattern recognition, decision-making, predictive modeling, anomaly detection, data exploration, and automation. It empowers machines to make accurate predictions and facilitates efficient decision-making in DPI pre-diction. For this research work, protein data was sourced from the KEGG database, while drug data was obtained from the DrugBank data machine-learning base. Results:: To address the issue of imbalanced Drug Protein Pairs (DPP), different balancing techniques like Random Over Sampling (ROS), Synthetic Minority Over-sampling Technique (SMOTE), and Adaptive SMOTE were employed. Given the large number of features associated with drugs and proteins, feature selection becomes necessary. Various feature selection methods were evaluated: Correlation, Information Gain (IG), Chi-Square (CS), and Relief. Multiple classification methods, including Support Vector Machines (SVM), Random Forest (RF), Adaboost, and Logistic Regression (LR), were used to predict DPI. Finally, this research identifies the best balancing, feature selection, and classification methods for accurate DPI prediction. Conclusion:: This comprehensive approach aims to overcome the limitations of existing methods and provide more reliable and efficient predictions in drug-protein interaction studies.

Publisher

Bentham Science Publishers Ltd.

Reference47 articles.

1. Paul S.M.; Mytelka D.S.; Dunwiddie C.T.; Persinger C.C.; Munos B.H.; Lindborg S.R.; Schacht A.L.; How to improve RandD productivity: The pharmaceutical industry’s grand challenge. Nat Rev Drug Discov 2010,9(3),203-214

2. Imming P.; Sinning C.; Meyer A.; Drugs, their targets and the nature and number of drug targets. Nat Rev Drug Discov 2006,5(10),821-834

3. Parada C.A.; Vivancos G.G.; Tambeli C.H.; de Queiróz Cunha F.; Ferreira S.H.; Activation of presynaptic NMDA receptors coupled to NaV1.8-resistant sodium channel C-fibers causes retrograde mechanical nociceptor sensitization. Proc Natl Acad Sci USA 2003,100(5),2923-2928

4. Mashalidis E.H.; A three-stage biophysical screening cascade for fragment- ´ based drug discovery. Nat Protoc 2013,8(11),2309-2324

5. Swinney D.C.; Anthony J.; How were new medicines discovered? Nat Rev Drug Discov 2011,10(7),507-519