Building a PubMed knowledge graph-Reference-Cited by-同舟云学术

Building a PubMed knowledge graph

Published:2020-06-26 Issue:1 Volume:7 Page:
ISSN:2052-4463
Container-title:Scientific Data
language:en
Short-container-title:Sci Data

Author:

Xu Jian^ORCID,Kim Sunkyu,Song Min,Jeong Minbyul,Kim Donghyeon,Kang Jaewoo^ORCID,Rousseau Justin F.^ORCID,Li Xin^ORCID,Xu Weijia,Torvik Vetle I.,Bu Yi,Chen Chongyan,Ebeid Islam Akef,Li Daifeng,Ding Ying^ORCID

Abstract

AbstractPubMed® is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguous, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID®, and identifying fine-grained affiliation data from MapAffil. Through the integration of these credible multi-source data, we could create connections among the bio-entities, authors, articles, affiliations, and funding. Data validation revealed that the BioBERT deep learning method of bio-entity extraction significantly outperformed the state-of-the-art models based on the F1 score (by 0.51%), with the author name disambiguation (AND) achieving an F1 score of 98.09%. PKG can trigger broader innovations, not only enabling us to measure scholarly impact, knowledge usage, and knowledge transfer, but also assisting us in profiling authors and organizations based on their connections with bio-entities.

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability

Link

http://www.nature.com/articles/s41597-020-0543-2.pdf

Reference69 articles.

1. Hakala, K., Kaewphan, S., Salakoski, T. & Ginter, F. Syntactic analyses and named entity recognition for PubMed and PubMed Central—up-to-the-minute. In Proceedings of the 15th Workshop on Biomedical Natural Language Processing 102–107, https://doi.org/10.18653/v1/W16-2913 (2016).

2. Bell, L., Chowdhary, R., Liu, J. S., Niu, X. & Zhang, J. Integrated bio-entity network: a system for biological knowledge discovery. PLoS One 6, e21474 (2011).

3. Torvik, V. I. MapAffil: a bibliographic tool for mapping author affiliation strings to cities and their geocodes worldwide. Dlib Mag. 21, 11–12, https://doi.org/10.1045/november2015-torvik (2015).

4. Achakulvisut T. Affiliation parser. GitHub, https://github.com/titipata/affiliation_parser/wiki (2017).

5. Torvik, V. I. & Smalheiser, N. R. Author name disambiguation in MEDLINE. ACM Trans. Knowl. Discov. Data 3, 11, https://doi.org/10.1145/1552303.1552304 (2009).

Cited by 116 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. FoodAtlas: Automated knowledge extraction of food and chemicals from literature;Computers in Biology and Medicine;2024-10

2. Community knowledge graph abstraction for enhanced link prediction: A study on pubmed knowledge graph;Journal of Biomedical Informatics;2024-09

3. Land Use Thematic Maps Recommendation Based on Pan-Map Visualization Dimension Theory;Land;2024-08-29

4. Measuring the labor market outcomes of universities: evidence from China’s listed company executives;Scientometrics;2024-08-24

5. Investigating clinical links in edge-labeled citation networks of biomedical research: A translational science perspective;Journal of Informetrics;2024-08