Affiliation:
1. Technische Universitat Dresden Medizinische Fakultat Carl Gustav Carus
2. Technische Universität Dresden: Technische Universitat Dresden
3. Goethe University Frankfurt: Goethe-Universitat Frankfurt am Main
4. Icahn School of Medicine at Mount Sinai Tisch Cancer Institute
5. The Ohio State University Comprehensive Cancer Center
Abstract
Abstract
Background
Given the geographical sparsity of Rare Diseases (RDs), assembling a cohort is often a challenging task. Common Data Models (CDM) can harmonize disparate sources of data that can be the basis of decision support systems and artificial intelligence-based studies, leading to new insights in the field. This work is sought to support the design of large-scale multi-center studies for rare diseases.
Methods
In an interdisciplinary group, we derived a list of elements of RDs in three medical domains (endocrinology, gastroenterology, and pneumonology) according to specialist knowledge and clinical guidelines in an iterative process. We then defined a RDs data structure that matched all our data elements and built Extract, Transform, Load (ETL) processes to transfer the structure to a joint CDM. To ensure interoperability of our developed CDM and its subsequent usage for further RDs domains, we ultimately mapped it to Observational Medical Outcomes Partnership (OMOP) CDM. We then included a fourth domain, hematology, as a proof-of-concept and mapped an acute myeloid leukemia (AML) dataset to the developed CDM.
Results
We have developed an OMOP-based rare diseases common data model (RD-CDM) using data elements from the three domains (endocrinology, gastroenterology, and pneumonology) and tested the CDM using data from the hematology domain. The total study cohort included 61,697 patients. After aligning our modules with those of Medical Informatics Initiative (MII) Core Dataset (CDS) modules, we leveraged its ETL process. This facilitated the seamless transfer of demographic information, diagnoses, procedures, laboratory results, and medication modules from our RD-CDM to the OMOP. For the phenotypes and genotypes, we developed a second ETL process. We finally derived lessons learned for customizing our RD-CDM for different RDs.
Discussion
This work can serve as a blueprint for other domains as its modularized structure could be extended towards novel data types. An interdisciplinary group of stakeholders that are actively supporting the project's progress is necessary to reach a comprehensive CDM.
Conclusion
The customized data structure related our RD-CDM can be used to perform multi-center studies to test data-driven hypotheses on a larger scale and take advantage of the analytical tools offered by the OHDSI community.
Publisher
Research Square Platform LLC
Reference57 articles.
1. Commissioner O of the. FDA. FDA. ; 2022 [cited 2023 Nov 27]. Rare Diseases at FDA. Available from: https://www.fda.gov/patients/rare-diseases-fda.
2. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database;Wakap S;Eur J Hum Genet,2019
3. Networking for rare diseases: a necessity for Europe;Aymé S;Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz,2007
4. Case for genome sequencing in infants and children with rare, undiagnosed or genetic diseases;Bick D;J Med Genet,2019
5. Paediatric genomics: diagnosing rare disease in children;Wright CF;Nat Rev Genet,2018