BioKG: a comprehensive, large-scale biomedical knowledge graph for AI-powered, data-driven biomedical research

Author:

Zhang Yuan,Sui Xin,Pan Feng,Yu Kaixian,Li Keqiao,Tian ShuboORCID,Erdengasileng Arslan,Han Qing,Wang Wanjing,Wang Jianan,Wang Jian,Sun Donghu,Chung Henry,Zhou Jun,Zhou Eric,Lee Ben,Zhang Peili,Qiu XingORCID,Zhao Tingting,Zhang Jinfeng

Abstract

AbstractTo cope with the rapid growth of scientific publications and data in biomedical research, knowledge graphs (KGs) have emerged as a powerful data structure for integrating large volumes of heterogeneous data to facilitate accurate and efficient information retrieval and automated knowledge discovery (AKD). However, transforming unstructured content from scientific literature into KGs has remained a significant challenge, with previous methods unable to achieve human-level accuracy. In this study, we utilized an information extraction pipeline that won first place in the LitCoin NLP Challenge to construct a large-scale KG using all PubMed abstracts. The quality of the large-scale information extraction rivals that of human expert annotations, signaling a new era of automatic, high-quality database construction from literature. Our extracted information markedly surpasses the amount of content in manually curated public databases. To enhance the KG’s comprehensiveness, we integrated relation data from 40 public databases and relation information inferred from high-throughput genomics data. The comprehensive KG enabled rigorous performance evaluation of AKD, which was infeasible in previous studies. We designed an interpretable, probabilistic-based inference method to identify indirect causal relations and achieved unprecedented results for drug target identification and drug repurposing. Taking lung cancer as an example, we found that 40% of drug targets reported in literature could have been predicted by our algorithm about 15 years ago in a retrospective study, demonstrating that substantial acceleration in scientific discovery could be achieved through automated hypotheses generation and timely dissemination. A cloud-based platform (https://www.biokde.com) was developed for academic users to freely access this rich structured data and associated tools.

Publisher

Cold Spring Harbor Laboratory

Reference74 articles.

1. Nobel Turing Challenge: creating the engine for scientific discovery;NPJ Syst Biol Appl,2021

2. S. Yu , Z. Yuan , J. Xia , S. Luo , H. Ying , S. Zeng , J. Ren , H. Yuan , Z. Zhao , Y. Lin , K. Lu , J. Wang , Y. Xie , H.-Y. Shum , BIOS: An Algorithmically Generated Biomedical Knowledge Graph. (2022).

3. Constructing knowledge graphs and their biomedical applications;Comput Struct Biotechnol J,2020

4. KG-Predict: A knowledge graph computational framework for drug repurposing;J Biomed Inform,2022

5. KGHC: a knowledge graph for hepatocellular carcinoma;BMC Med Inform Decis Mak,2020

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3