How to approach machine learning-based prediction of drug/compound–target interactions-Reference-Cited by-同舟云学术

How to approach machine learning-based prediction of drug/compound–target interactions

Published:2023-02-06 Issue:1 Volume:15 Page:
ISSN:1758-2946
Container-title:Journal of Cheminformatics
language:en
Short-container-title:J Cheminform

Author:

Atas Guvenilir Heval,Doğan Tunca

Abstract

AbstractThe identification of drug/compound–target interactions (DTIs) constitutes the basis of drug discovery, for which computational predictive approaches have been developed. As a relatively new data-driven paradigm, proteochemometric (PCM) modeling utilizes both protein and compound properties as a pair at the input level and processes them via statistical/machine learning. The representation of input samples (i.e., proteins and their ligands) in the form of quantitative feature vectors is crucial for the extraction of interaction-related properties during the artificial learning and subsequent prediction of DTIs. Lately, the representation learning approach, in which input samples are automatically featurized via training and applying a machine/deep learning model, has been utilized in biomedical sciences. In this study, we performed a comprehensive investigation of different computational approaches/techniques for protein featurization (including both conventional approaches and the novel learned embeddings), data preparation and exploration, machine learning-based modeling, and performance evaluation with the aim of achieving better data representations and more successful learning in DTI prediction. For this, we first constructed realistic and challenging benchmark datasets on small, medium, and large scales to be used as reliable gold standards for specific DTI modeling tasks. We developed and applied a network analysis-based splitting strategy to divide datasets into structurally different training and test folds. Using these datasets together with various featurization methods, we trained and tested DTI prediction models and evaluated their performance from different angles. Our main findings can be summarized under 3 items: (i) random splitting of datasets into train and test folds leads to near-complete data memorization and produce highly over-optimistic results, as a result, should be avoided, (ii) learned protein sequence embeddings work well in DTI prediction and offer high potential, despite interaction-related properties (e.g., structures) of proteins are unused during their self-supervised model training, and (iii) during the learning process, PCM models tend to rely heavily on compound features while partially ignoring protein features, primarily due to the inherent bias in DTI data, indicating the requirement for new and unbiased datasets. We hope this study will aid researchers in designing robust and high-performing data-driven DTI prediction systems that have real-world translational value in drug discovery.

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Computer Graphics and Computer-Aided Design,Physical and Theoretical Chemistry,Computer Science Applications

Link

https://link.springer.com/content/pdf/10.1186/s13321-023-00689-w.pdf

Reference84 articles.

1. Rifaioglu AS, Atas H, Martin MJ et al (2019) Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Brief Bioinform 20:1878–1912. https://doi.org/10.1093/bib/bby061

2. Rifaioglu AS, Nalbat E, Atalay V et al (2020) DEEPScreen: high performance drug–target interaction prediction with convolutional neural networks using 2-D structural compound representations. Chem Sci 11:2531–2557. https://doi.org/10.1039/C9SC03414E

3. Lavecchia A, Di Giovanni C (2013) Virtual screening strategies in drug discovery: a critical review. Curr Med Chem 20:2839–2860

4. Cortés-Ciriano I, Ain QU, Subramanian V et al (2015) Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. Medchemcomm 6:24–50. https://doi.org/10.1039/C4MD00216D

5. Tabei Y, Pauwels E, Stoven V et al (2012) Identification of chemogenomic features from drug–target interaction networks using interpretable classifiers. Bioinformatics 28:487–494. https://doi.org/10.1093/bioinformatics/bts412

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Impact of Artificial Intelligence on Drug Development and Delivery;Current Topics in Medicinal Chemistry;2024-08-12

2. Polypharmacology prediction: the long road toward comprehensively anticipating small-molecule selectivity to de-risk drug discovery;Expert Opinion on Drug Discovery;2024-07-14

3. Redefining the Game: MVAE-DFDPnet's Low-Dimensional Embeddings for Superior Drug-Protein Interaction Predictions;IEEE Journal of Biomedical and Health Informatics;2024-07

4. MocFormer: A Two-Stage Pre-training-Driven Transformer for Drug–Target Interactions Prediction;International Journal of Computational Intelligence Systems;2024-06-26

5. The recent advances in the approach of artificial intelligence (AI) towards drug discovery;Frontiers in Chemistry;2024-05-31