Insights into performance evaluation of compound–protein interaction prediction methods-Reference-Cited by-同舟云学术

Insights into performance evaluation of compound–protein interaction prediction methods

Published:2022-09-01 Issue:Supplement_2 Volume:38 Page:ii75-ii81
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Yaseen Adiba¹,Amin Imran²,Akhter Naeem¹,Ben-Hur Asa³,Minhas Fayyaz⁴^ORCID

Affiliation:

1. Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS) , Islamabad 45650, Pakistan

2. National Institute for Biotechnology and Genetic Engineering , Faisalabad 38000, Pakistan

3. Department of Computer Science, Colorado State University , Fort Collins, CO 80523, USA

4. Department of Computer Science, University of Warwick , Coventry CV4 7AL, UK

Abstract

Abstract Motivation Machine-learning-based prediction of compound–protein interactions (CPIs) is important for drug design, screening and repurposing. Despite numerous recent publication with increasing methodological sophistication claiming consistent improvements in predictive accuracy, we have observed a number of fundamental issues in experiment design that produce overoptimistic estimates of model performance. Results We systematically analyze the impact of several factors affecting generalization performance of CPI predictors that are overlooked in existing work: (i) similarity between training and test examples in cross-validation; (ii) synthesizing negative examples in absence of experimentally verified negative examples and (iii) alignment of evaluation protocol and performance metrics with real-world use of CPI predictors in screening large compound libraries. Using both state-of-the-art approaches by other researchers as well as a simple kernel-based baseline, we have found that effective assessment of generalization performance of CPI predictors requires careful control over similarity between training and test examples. We show that, under stringent performance assessment protocols, a simple kernel-based approach can exceed the predictive performance of existing state-of-the-art methods. We also show that random pairing for generating synthetic negative examples for training and performance evaluation results in models with better generalization in comparison to more sophisticated strategies used in existing studies. Our analyses indicate that using proposed experiment design strategies can offer significant improvements for CPI prediction leading to effective target compound screening for drug repurposing and discovery of putative chemical ligands of SARS-CoV-2-Spike and Human-ACE2 proteins. Availability and implementation Code and supplementary material available at https://github.com/adibayaseen/HKRCPI. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

Pakistan HEC

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

https://academic.oup.com/bioinformatics/article-pdf/38/Supplement_2/ii75/49886471/btac496.pdf

Reference38 articles.

1. Choosing negative examples for the prediction of protein–protein interactions;Ben-Hur;BMC Bioinformatics,2006

2. Supervised prediction of drug–target interactions using bipartite local models;Bleakley;Bioinformatics,2009

3. Chemogenomics: an emerging strategy for rapid target and drug discovery;Bredel;Nat. Rev. Genet,2004

4. High-throughput screening for drug discovery;Broach;Nature,1996

5. ChemoPy: freely available python package for computational biology and chemoinformatics;Cao;Bioinformatics,2013

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Improving Compound–Protein Interaction Prediction by Self-Training with Augmenting Negative Samples;Journal of Chemical Information and Modeling;2023-07-17

2. MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem;Metabolites;2023-02-21

3. On the choice of negative examples for prediction of host-pathogen protein interactions;Frontiers in Bioinformatics;2022-12-15

4. Mad Hatter correctly annotates 98% of small molecule tandem mass spectra searching in PubChem;2022-12-12