Abstract
Objectives: This was a pilot study to investigate the possibility of clinical analysis to support the lack of sample size of real data and to generate synthetic data. Since real data has many limitations, such as ethical issues and costly issues, there have been many attempts to create realistic synthetic data. The focus is on whether synthetic data can be used instead of real data. Methods: This study analyzed 11,978 lung cancer patients who used anticancer drug therapy using synthetic data as a quasi-experimental study. Clinically significant variables were extracted and some tables containing patient status and treatment records were preprocessed. This experiment was applied to the propensity score matching technique to prevent the bias of covariates. Then, the preprocessed data were analyzed using Kaplan-Meier estimation and Cox proportional hazards model. Results: When plotting the survival curves, the curves from the synthetic data did not match the curves for the actual data of the other covariates. In Cohort 1, Gen I had a better 5-year OS than Gen II (S<sub>1</sub>=0.973, S<sub>2</sub>=0.953, <i>p</i><0.05). Similarly, Gen I anti-cancer was better than Gen III in Cohort 2 (S<sub>1</sub>=0.990, S<sub>3</sub>=0.884, <i>p</i>< 0.05). In the exploratory sub-group analysis using the Cox regression model, the risk ratio was estimated. We found that Gen I had a better effect on hazard ratio than Gen II and III. However, those results were different from the actual trend. Conclusions: It was found that the analysis based on the DATA-FREE-BOX data was different from the trend of the survival analysis conducted with the real data. The trend of this analysis could be different from the real trend. It will be able to contribute to data-validation. Moreover, it is expected that the same methodology can be applied in clinical studies based on actual data by utilizing the technique used in this study.
Funder
Korea Health Industry Development Institute
Ministry of Health and Welfare
Publisher
The Korean Society of Health Informatics and Statistics