Can synthetic data be a proxy for real clinical trial data? A validation study

Author:

Azizi Zahra,Zheng Chaoyi,Mosquera Lucy,Pilote LouiseORCID,El Emam KhaledORCID

Abstract

ObjectivesThere are increasing requirements to make research data, especially clinical trial data, more broadly available for secondary analyses. However, data availability remains a challenge due to complex privacy requirements. This challenge can potentially be addressed using synthetic data.SettingReplication of a published stage III colon cancer trial secondary analysis using synthetic data generated by a machine learning method.ParticipantsThere were 1543 patients in the control arm that were included in our analysis.Primary and secondary outcome measuresAnalyses from a study published on the real dataset were replicated on synthetic data to investigate the relationship between bowel obstruction and event-free survival. Information theoretic metrics were used to compare the univariate distributions between real and synthetic data. Percentage CI overlap was used to assess the similarity in the size of the bivariate relationships, and similarly for the multivariate Cox models derived from the two datasets.ResultsAnalysis results were similar between the real and synthetic datasets. The univariate distributions were within 1% of difference on an information theoretic metric. All of the bivariate relationships had CI overlap on the tau statistic above 50%. The main conclusion from the published study, that lack of bowel obstruction has a strong impact on survival, was replicated directionally and the HR CI overlap between the real and synthetic data was 61% for overall survival (real data: HR 1.56, 95% CI 1.11 to 2.2; synthetic data: HR 2.03, 95% CI 1.44 to 2.87) and 86% for disease-free survival (real data: HR 1.51, 95% CI 1.18 to 1.95; synthetic data: HR 1.63, 95% CI 1.26 to 2.1).ConclusionsThe high concordance between the analytical results and conclusions from synthetic and real data suggests that synthetic data can be used as a reasonable proxy for real clinical trial datasets.Trial registration numberNCT00079274.

Funder

The GOING-FWD Consortium is funded by the GENDER-NET Plus ERA-NET Initiative

Canadian Institutes of Health Research

Natural Sciences and Engineering Research Council of Canada

Publisher

BMJ

Subject

General Medicine

Reference71 articles.

1. Reanalyses of Randomized Clinical Trial Data

2. European medicines Agency policy 0070: an exploratory review of data utility in clinical study reports for academic research;Ferran;BMC Med Res Methodol,2019

3. Phrma & EFPIA . Principles for responsible clinical trial data sharing, 2013. Available: http://www.phrma.org/sites/default/files/pdf/PhRMAPrinciplesForResponsibleClinicalTrialDataSharing.pdf

4. TransCelerate Biopharma . De-identification and anonymization of individual patient data in clinical studies: a model approach, 2017.

5. TransCelerate Biopharma . Protection of personal data in clinical documents – a model approach, 2017.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3