Affiliation:
1. Institute of Medical Technology Peking University Health Science Center Beijing China
2. National Institute of Health Data Science, Peking University Beijing China
Abstract
AbstractAimIt is essential for health researchers to have a systematic understanding of third‐party variables that influence both the exposure and outcome under investigation, as shown by a directed acyclic graph (DAG). The traditional construction of DAGs through literature review and expert knowledge often needs to be more systematic and consistent, leading to potential biases. We try to introduce an automatic approach to building network linking variables of interest.MethodsLarge‐scale text mining from medical literature was utilized to construct a conceptual network based on the Semantic MEDLINE Database (SemMedDB). SemMedDB is a PubMed‐scale repository of the “concept‐relation‐concept” triple format. Relations between concepts are categorized as Excitatory, Inhibitory, or General.ResultsTo facilitate the use of large‐scale triple sets in SemMedDB, we have developed a computable biomedical knowledge (CBK) system (https://cbk.bjmu.edu.cn/), a website that enables direct retrieval of related publications and their corresponding triples without the necessity of writing SQL statements. Three case studies were elaborated to demonstrate the applications of the CBK system.ConclusionsThe CBK system is openly available and user‐friendly for rapidly capturing a set of influencing factors for a phenotype and building candidate DAGs between exposure‐outcome variables. It could be a valuable tool to reduce the exploration time in considering relationships between variables, and constructing a DAG. A reliable and standardized DAG could significantly improve the design and interpretation of observational health research.
Funder
National Natural Science Foundation of China