Spot the difference: comparing results of analyses from real patient data and synthetic derivatives-Reference-Cited by-同舟云学术

Spot the difference: comparing results of analyses from real patient data and synthetic derivatives

Published:2020-12-01 Issue:4 Volume:3 Page:557-566
ISSN:2574-2531
Container-title:JAMIA Open
language:en
Short-container-title:

Author:

Foraker Randi E¹²^ORCID,Yu Sean C²,Gupta Aditi²,Michelson Andrew P³,Pineda Soto Jose A⁴,Colvin Ryan²⁴,Loh Francis⁵,Kollef Marin H³,Maddox Thomas⁶,Evanoff Bradley¹,Dror Hovav⁷,Zamstein Noa⁷,Lai Albert M¹²^ORCID,Payne Philip R O¹²^ORCID

Affiliation:

1. Division of General Medical Sciences, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA

2. Department of Medicine, Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA

3. Division of Pulmonary and Critical Care Medicine, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA

4. Division of Critical Care Medicine, Department of Anesthesiology and Critical Care Medicine, Children’s Hospital of Los Angeles, Los Angeles, California, USA

5. School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA

6. Healthcare Innovation Lab, BJC Healthcare, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA

7. MDClone Ltd, Beer Sheva, Israel

Abstract

Abstract Background Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advantages over data deidentification. Objectives To assess a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) for its ability to produce data that can be used for research purposes while obviating privacy and confidentiality concerns. Methods We explored three use cases and tested the robustness of synthetic data by comparing the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, and spatial representations of the data. We designed these use cases with the purpose of conducting analyses at the observation level (Use Case 1), patient cohorts (Use Case 2), and population-level data (Use Case 3). Results For each use case, the results of the analyses were sufficiently statistically similar (P > 0.05) between the synthetic derivative and the real data to draw the same conclusions. Discussion and conclusion This article presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in clinical research for faster insights and improved data sharing in support of precision healthcare.

Funder

WUSM-Pediatric Neurocritical Care Program

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Link

http://academic.oup.com/jamiaopen/article-pdf/3/4/557/36625809/ooaa060.pdf

Reference18 articles.

1. Are synthetic data derivatives the future of translational medicine?;Foraker;JACC Basic Transl Sci,2018

2. Challenges and Opportunities in Secondary Analyses of Electronic Health Record Data

3. Privacy protection and technology diffusion: the case of electronic medical records;Miller;Manag Sci,2009

Cited by 37 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Synthetic data can aid the analysis of clinical outcomes: How much can it be trusted?;Proceedings of the National Academy of Sciences;2024-07-31

2. Early Detection of Pulmonary Embolism in a General Patient Population Immediately Upon Hospital Admission Using Machine Learning to Identify New, Unidentified Risk Factors: Model Development Study;Journal of Medical Internet Research;2024-07-30

3. The Privacy-Preserving High-Dimensional Synthetic Data Generation and Evaluation in the Healthcare Domain;Advances in Data Mining and Database Management;2024-04-19

4. An evaluation of the replicability of analyses using synthetic health data;Scientific Reports;2024-03-24

5. Synthetic data in cancer and cerebrovascular disease research: A novel approach to big data;PLOS ONE;2024-02-07