GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47-Reference-Cited by-同舟云学术

GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47

Published:2023-10-20 Issue: Volume:11 Page:
ISSN:2296-2646
Container-title:Frontiers in Chemistry
language:
Short-container-title:Front. Chem.

Author:

Shan Wenying,Chen Lvqi,Xu Hao,Zhong Qinghao,Xu Yinqiu,Yao Hequan,Lin Kejiang,Li Xuanyi

Abstract

Identifying compound–protein interaction plays a vital role in drug discovery. Artificial intelligence (AI), especially machine learning (ML) and deep learning (DL) algorithms, are playing increasingly important roles in compound-protein interaction (CPI) prediction. However, ML relies on learning from large sample data. And the CPI for specific target often has a small amount of data available. To overcome the dilemma, we propose a virtual screening model, in which word2vec is used as an embedding tool to generate low-dimensional vectors of SMILES of compounds and amino acid sequences of proteins, and the modified multi-grained cascade forest based gcForest is used as the classifier. This proposed method is capable of constructing a model from raw data, adjusting model complexity according to the scale of datasets, especially for small scale datasets, and is robust with few hyper-parameters and without over-fitting. We found that the proposed model is superior to other CPI prediction models and performs well on the constructed challenging dataset. We finally predicted 2 new inhibitors for clusters of differentiation 47(CD47) which has few known inhibitors. The IC50s of enzyme activities of these 2 new small molecular inhibitors targeting CD47-SIRPα interaction are 3.57 and 4.79 μM respectively. These results fully demonstrate the competence of this concise but efficient tool for CPI prediction.

Publisher

Frontiers Media SA

Subject

General Chemistry

Reference46 articles.

1. Reviewing the potential of the Experience Sampling Method (ESM) for capturing second language exposure and use;Arndt;Second Lang. Res.,2023

2. Continuous distributed representation of biological sequences for deep proteomics and genomics;Asgari;PLoS One,2015

3. Supervised prediction of drug–target interactions using bipartite local models;Bleakley;Bioinformatics,2009

4. A homogeneous SIRPα-CD47 cell-based, ligand-binding assay: utility for small molecule drug development in immuno-oncology;Burgess;PLoS One,2020

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An end-to-end method for predicting compound-protein interactions based on simplified homogeneous graph convolutional network and pre-trained language model;Journal of Cheminformatics;2024-06-07

2. Development of small molecule drugs targeting immune checkpoints;Cancer Biology & Medicine;2024-05-09

3. Discovery of Covalent Lead Compounds Targeting 3CL Protease with a Lateral Interactions Spiking Neural Network;Journal of Chemical Information and Modeling;2024-03-23