Using BERT to identify drug-target interactions from whole PubMed-Reference-Cited by-同舟云学术

Using BERT to identify drug-target interactions from whole PubMed

Published:2021-09-11 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Aldahdooh Jehad,Vähä-Koskela Markus,Tang Jing,Tanoli Ziaurrehman

Abstract

ABSTRACTBackgroundDrug-target interactions (DTIs) are critical for drug repurposing and elucidation of drug mechanisms, and they are collected in large databases, such as ChEMBL, BindingDB, DrugBank and DrugTargetCommons. However, the number of studies providing this data (~0.1 million) likely constitutes only a fraction of all studies on PubMed that contain experimental DTI data. Finding such studies and extracting the experimental information is a challenging task, and there is a pressing need for machine learning for the extraction and curation of DTIs. To this end, we developed new text mining document classifiers based on the Bidirectional Encoder Representations from Transformers (BERT) algorithm. Because DTI data intimately depends on the type of assays used to generate it, we also aimed to incorporate functions to predict the assay format.ResultsOur novel method identified and extracted DTIs from 2.1 million studies not previously included in public DTI databases. Using 10-fold cross-validation, we obtained ~99% accuracy for identifying studies containing drug-target pairs. The accuracy for the prediction of assay format is ~90%, which leaves room for improvement in future studies.ConclusionThe BERT model in this study is robust and the proposed pipeline can be used to identify new and previously overlooked studies containing DTIs and automatically extract the DTI data points. The tabular output facilitates validation of the extracted data and assay format information. Overall, our method provides a significant advancement in machine-assisted DTI extraction and curation. We expect it to be a useful addition to drug mechanism discovery and repurposing.

Publisher

Cold Spring Harbor Laboratory

Reference33 articles.

1. The Cost of New Drug Discovery and Development;Discov. Med,2009

2. Old drugs, new tricks

3. Tanoli, Z. ; Vähä-Koskela, M. ; Aittokallio, T. Artificial Intelligence, Machine Learning and Drug Repurposing in Cancer. Expert Opin. Drug Discov., 2021.

4. The ChEMBL Database in 2017;Nucleic Acids Res,2016

5. BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology