The use of metadata-driven approaches for data harmonization in the medical domain: a scoping review (Preprint)

Author:

Peng YuanORCID,Bathelt FranziskaORCID,Gebler RichardORCID,Gött RobertORCID,Heidenreich AndreasORCID,Henke ElisaORCID,Kadioglu DennisORCID,Lorenz StephanORCID,Vengadeswaran AbishaaORCID,Sedlmayr MartinORCID

Abstract

BACKGROUND

Multi-site clinical studies are increasingly utilizing Real-world data (RWD) to gain Real-world evidence (RWE). However, due to the heterogeneity of source data, it is difficult to analyze such data in a unified way across clinics. Therefore, the implementation of Extraction-Transform-Load (ETL) or Extraction-Load-Transform (ELT) processes for harmonizing local health data is necessary, in order to guarantee the data quality for research. Therefore, the development of such processes is time-consuming and unsustainable. A promising way to ease this is the generalization of ETL/ELT processes.

OBJECTIVE

In this work, we investigate existing possibilities for the development of generic ETL/ELT processes. Particularly, we focus on approaches with low development complexity by using descriptive metadata and structural metadata.

METHODS

We conducted a literature review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We used four publication databases (i.e. PubMed, IEEE Explore, Web of Science and Biomed Center) to search for relevant publications from 2012 to 2022. The PRISMA flow was then visualized using an R-based online tool. All relevant contents of the publications were extracted into a spreadsheet for further analysis and visualization.

RESULTS

Regarding the PRISMA guidelines, we included 33 publications in this literature review. All included publications were categorized into seven different focus groups (i.e. Medicine, Data warehouse, Big Data, Industry, Geoinformatics, Archaeology and Military). Based on the extracted data, ontology-based and rule-based approaches were the two most used approaches in different focus groups. The ontology-based approach was mostly implemented using Protégé, while the rule-based approach was mostly implemented manually.

CONCLUSIONS

Our literature review shows that using metadata-driven approaches to develop an ETL/ELT process can serve different purposes in different focus groups. In some cases, using multiple metadata-driven approaches in combination can provide more opportunities for the development of ETL/ELT processes. Therefore, it is necessary to verify the ability of improving ETL/ELT processes for harmonizing medical data by using multiple metadata-driven approaches in combination in the future.

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3