BioKG: a comprehensive, large-scale biomedical knowledge graph for AI-powered, data-driven biomedical research
Author:
Zhang Yuan, Sui Xin, Pan Feng, Yu Kaixian, Li Keqiao, Tian ShuboORCID, Erdengasileng Arslan, Han Qing, Wang Wanjing, Wang Jianan, Wang Jian, Sun Donghu, Chung Henry, Zhou Jun, Zhou Eric, Lee Ben, Zhang Peili, Qiu XingORCID, Zhao Tingting, Zhang Jinfeng
Abstract
AbstractTo cope with the rapid growth of scientific publications and data in biomedical research, knowledge graphs (KGs) have emerged as a powerful data structure for integrating large volumes of heterogeneous data to facilitate accurate and efficient information retrieval and automated knowledge discovery (AKD). However, transforming unstructured content from scientific literature into KGs has remained a significant challenge, with previous methods unable to achieve human-level accuracy. In this study, we utilized an information extraction pipeline that won first place in the LitCoin NLP Challenge to construct a large-scale KG using all PubMed abstracts. The quality of the large-scale information extraction rivals that of human expert annotations, signaling a new era of automatic, high-quality database construction from literature. Our extracted information markedly surpasses the amount of content in manually curated public databases. To enhance the KG’s comprehensiveness, we integrated relation data from 40 public databases and relation information inferred from high-throughput genomics data. The comprehensive KG enabled rigorous performance evaluation of AKD, which was infeasible in previous studies. We designed an interpretable, probabilistic-based inference method to identify indirect causal relations and achieved unprecedented results for drug target identification and drug repurposing. Taking lung cancer as an example, we found that 40% of drug targets reported in literature could have been predicted by our algorithm about 15 years ago in a retrospective study, demonstrating that substantial acceleration in scientific discovery could be achieved through automated hypotheses generation and timely dissemination. A cloud-based platform (https://www.biokde.com) was developed for academic users to freely access this rich structured data and associated tools.
Publisher
Cold Spring Harbor Laboratory
Reference74 articles.
1. Nobel Turing Challenge: creating the engine for scientific discovery;NPJ Syst Biol Appl,2021 2. S. Yu , Z. Yuan , J. Xia , S. Luo , H. Ying , S. Zeng , J. Ren , H. Yuan , Z. Zhao , Y. Lin , K. Lu , J. Wang , Y. Xie , H.-Y. Shum , BIOS: An Algorithmically Generated Biomedical Knowledge Graph. (2022). 3. Constructing knowledge graphs and their biomedical applications;Comput Struct Biotechnol J,2020 4. KG-Predict: A knowledge graph computational framework for drug repurposing;J Biomed Inform,2022 5. KGHC: a knowledge graph for hepatocellular carcinoma;BMC Med Inform Decis Mak,2020
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|