Generation of a Realistic Synthetic Laryngeal Cancer Cohort for AI Applications
Author:
Katalinic Mika1, Schenk Martin1, Franke Stefan1, Katalinic Alexander2ORCID, Neumuth Thomas1ORCID, Dietz Andreas3, Stoehr Matthaeus3ORCID, Gaebel Jan1
Affiliation:
1. Innovation Center Computer Assisted Surgery, Faculty of Medicine, University Leipzig, 04109 Leipzig, Germany 2. Institute of Social Medicine and Epidemiology, University of Luebeck, 23562 Luebeck, Germany 3. Department of Otolaryngology, Head and Neck Surgery, University Hospital Leipzig, 04103 Leipzig, Germany
Abstract
Background: Obtaining large amounts of real patient data involves great efforts and expenses, and processing this data is fraught with data protection concerns. Consequently, data sharing might not always be possible, particularly when large, open science datasets are needed, as for AI development. For such purposes, the generation of realistic synthetic data may be the solution. Our project aimed to generate realistic cancer data with the use case of laryngeal cancer. Methods: We used the open-source software Synthea and programmed an additional module for development, treatment and follow-up for laryngeal cancer by using external, real-world (RW) evidence from guidelines and cancer registries from Germany. To generate an incidence-based cohort view, we randomly drew laryngeal cancer cases from the simulated population and deceased persons, stratified by the real-world age and sex distributions at diagnosis. Results: A module with age- and stage-specific treatment and prognosis for laryngeal cancer was successfully implemented. The synthesized population reflects RW prevalence well, extracting a cohort of 50,000 laryngeal cancer patients. Descriptive data on stage-specific and 5-year overall survival were in accordance with published data. Conclusions: We developed a large cohort of realistic synthetic laryngeal cancer cases with Synthea. Such data can be shared and published open source without data protection issues.
Funder
German Federal Ministry of Education and Research
Reference38 articles.
1. Tucker, A., Wang, Z., Rotalinti, Y., and Myles, P. (2020). Generating High-Fidelity Synthetic Patient Data for Assessing Machine Learning Healthcare Software. NPJ Digit. Med., 3. 2. Chen, A., and Chen, D.O. (2022). Simulation of a Machine Learning Enabled Learning Health System for Risk Prediction Using Synthetic Patient Data. Sci. Rep., 12. 3. Weldon, J., Ward, T., and Brophy, E. (2021). Generation of Synthetic Electronic Health Records Using a Federated GAN. arXiv. 4. Ive, J., Viani, N., Kam, J., Yin, L., Verma, S., Puntis, S., Cardinal, R.N., Roberts, A., Stewart, R., and Velupillai, S. (2020). Generation and Evaluation of Artificial Mental Health Records for Natural Language Processing. NPJ Digit. Med., 3. 5. COVID-19 CT Image Synthesis With a Conditional Generative Adversarial Network;Jiang;IEEE J. Biomed. Health Inform.,2021
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|