Using BERT to identify drug-target interactions from whole PubMed

Author:

Aldahdooh Jehad,Vähä-Koskela Markus,Tang Jing,Tanoli Ziaurrehman

Abstract

ABSTRACTBackgroundDrug-target interactions (DTIs) are critical for drug repurposing and elucidation of drug mechanisms, and they are collected in large databases, such as ChEMBL, BindingDB, DrugBank and DrugTargetCommons. However, the number of studies providing this data (~0.1 million) likely constitutes only a fraction of all studies on PubMed that contain experimental DTI data. Finding such studies and extracting the experimental information is a challenging task, and there is a pressing need for machine learning for the extraction and curation of DTIs. To this end, we developed new text mining document classifiers based on the Bidirectional Encoder Representations from Transformers (BERT) algorithm. Because DTI data intimately depends on the type of assays used to generate it, we also aimed to incorporate functions to predict the assay format.ResultsOur novel method identified and extracted DTIs from 2.1 million studies not previously included in public DTI databases. Using 10-fold cross-validation, we obtained ~99% accuracy for identifying studies containing drug-target pairs. The accuracy for the prediction of assay format is ~90%, which leaves room for improvement in future studies.ConclusionThe BERT model in this study is robust and the proposed pipeline can be used to identify new and previously overlooked studies containing DTIs and automatically extract the DTI data points. The tabular output facilitates validation of the extracted data and assay format information. Overall, our method provides a significant advancement in machine-assisted DTI extraction and curation. We expect it to be a useful addition to drug mechanism discovery and repurposing.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3