Author:
Nguyen Huyen,Chen Haihua,Chen Jiangping,Kargozari Kate,Ding Junhua
Abstract
Purpose
This study aims to evaluate a method of building a biomedical knowledge graph (KG).
Design/methodology/approach
This research first constructs a COVID-19 KG on the COVID-19 Open Research Data Set, covering information over six categories (i.e. disease, drug, gene, species, therapy and symptom). The construction used open-source tools to extract entities, relations and triples. Then, the COVID-19 KG is evaluated on three data-quality dimensions: correctness, relatedness and comprehensiveness, using a semiautomatic approach. Finally, this study assesses the application of the KG by building a question answering (Q&A) system. Five queries regarding COVID-19 genomes, symptoms, transmissions and therapeutics were submitted to the system and the results were analyzed.
Findings
With current extraction tools, the quality of the KG is moderate and difficult to improve, unless more efforts are made to improve the tools for entity extraction, relation extraction and others. This study finds that comprehensiveness and relatedness positively correlate with the data size. Furthermore, the results indicate the performances of the Q&A systems built on the larger-scale KGs are better than the smaller ones for most queries, proving the importance of relatedness and comprehensiveness to ensure the usefulness of the KG.
Originality/value
The KG construction process, data-quality-based and application-based evaluations discussed in this paper provide valuable references for KG researchers and practitioners to build high-quality domain-specific knowledge discovery systems.
Subject
Library and Information Sciences,General Computer Science
Reference57 articles.
1. Ahamed, S. and Samad, M. (2020), “Information mining for covid-19 research from a large volume of scientific literature”, Arxiv preprint arxiv:2004.02085.
2. Leveraging linguistic structure for open domain information extraction,2015
3. Erlkg: entity representation learning and knowledge graph based association analysis of covid-19 through mining of unstructured biomedical corpora,2020
4. Knowledge graphs: new directions for knowledge representation on the semantic web (Dagstuhl Seminar 18371);Dagstuhl Reports,2019
5. Analysing the requirements for an open research knowledge graph: use cases, quality requirements, and construction strategies;International Journal on Digital Libraries,2022
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献