Abstract
ObjectivesThere are increasing requirements to make research data, especially clinical trial data, more broadly available for secondary analyses. However, data availability remains a challenge due to complex privacy requirements. This challenge can potentially be addressed using synthetic data.SettingReplication of a published stage III colon cancer trial secondary analysis using synthetic data generated by a machine learning method.ParticipantsThere were 1543 patients in the control arm that were included in our analysis.Primary and secondary outcome measuresAnalyses from a study published on the real dataset were replicated on synthetic data to investigate the relationship between bowel obstruction and event-free survival. Information theoretic metrics were used to compare the univariate distributions between real and synthetic data. Percentage CI overlap was used to assess the similarity in the size of the bivariate relationships, and similarly for the multivariate Cox models derived from the two datasets.ResultsAnalysis results were similar between the real and synthetic datasets. The univariate distributions were within 1% of difference on an information theoretic metric. All of the bivariate relationships had CI overlap on the tau statistic above 50%. The main conclusion from the published study, that lack of bowel obstruction has a strong impact on survival, was replicated directionally and the HR CI overlap between the real and synthetic data was 61% for overall survival (real data: HR 1.56, 95% CI 1.11 to 2.2; synthetic data: HR 2.03, 95% CI 1.44 to 2.87) and 86% for disease-free survival (real data: HR 1.51, 95% CI 1.18 to 1.95; synthetic data: HR 1.63, 95% CI 1.26 to 2.1).ConclusionsThe high concordance between the analytical results and conclusions from synthetic and real data suggests that synthetic data can be used as a reasonable proxy for real clinical trial datasets.Trial registration numberNCT00079274.
Funder
The GOING-FWD Consortium is funded by the GENDER-NET Plus ERA-NET Initiative
Canadian Institutes of Health Research
Natural Sciences and Engineering Research Council of Canada
Reference71 articles.
1. Reanalyses of Randomized Clinical Trial Data
2. European medicines Agency policy 0070: an exploratory review of data utility in clinical study reports for academic research;Ferran;BMC Med Res Methodol,2019
3. Phrma & EFPIA . Principles for responsible clinical trial data sharing, 2013. Available: http://www.phrma.org/sites/default/files/pdf/PhRMAPrinciplesForResponsibleClinicalTrialDataSharing.pdf
4. TransCelerate Biopharma . De-identification and anonymization of individual patient data in clinical studies: a model approach, 2017.
5. TransCelerate Biopharma . Protection of personal data in clinical documents – a model approach, 2017.
Cited by
54 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献