Comparative assessment of synthetic time series generation approaches in healthcare: leveraging patient metadata for accurate data synthesis

Author:

Isasa Imanol,Hernandez Mikel,Epelde Gorka,Londoño Francisco,Beristain Andoni,Larrea Xabat,Alberdi Ane,Bamidis Panagiotis,Konstantinidis Evdokimos

Abstract

Abstract Background Synthetic data is an emerging approach for addressing legal and regulatory concerns in biomedical research that deals with personal and clinical data, whether as a single tool or through its combination with other privacy enhancing technologies. Generating uncompromised synthetic data could significantly benefit external researchers performing secondary analyses by providing unlimited access to information while fulfilling pertinent regulations. However, the original data to be synthesized (e.g., data acquired in Living Labs) may consist of subjects’ metadata (static) and a longitudinal component (set of time-dependent measurements), making it challenging to produce coherent synthetic counterparts. Methods Three synthetic time series generation approaches were defined and compared in this work: only generating the metadata and coupling it with the real time series from the original data (A1), generating both metadata and time series separately to join them afterwards (A2), and jointly generating both metadata and time series (A3). The comparative assessment of the three approaches was carried out using two different synthetic data generation models: the Wasserstein GAN with Gradient Penalty (WGAN-GP) and the DöppelGANger (DGAN). The experiments were performed with three different healthcare-related longitudinal datasets: Treadmill Maximal Effort Test (TMET) measurements from the University of Malaga (1), a hypotension subset derived from the MIMIC-III v1.4 database (2), and a lifelogging dataset named PMData (3). Results Three pivotal dimensions were assessed on the generated synthetic data: resemblance to the original data (1), utility (2), and privacy level (3). The optimal approach fluctuates based on the assessed dimension and metric. Conclusion The initial characteristics of the datasets to be synthesized play a crucial role in determining the best approach. Coupling synthetic metadata with real time series (A1), as well as jointly generating synthetic time series and metadata (A3), are both competitive methods, while separately generating time series and metadata (A2) appears to perform more poorly overall.

Funder

Horizon 2020 Framework Programme

Department of Education, Universities and Research of the Basque Country

Publisher

Springer Science and Business Media LLC

Subject

Health Informatics,Health Policy,Computer Science Applications

Reference26 articles.

1. European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data. Available from: http://data.europa.eu/eli/reg/2016/679/oj.

2. Sweeney L, von Loewenfeldt M, Perry M. Saying it’s anonymous doesn’t make it so: re-identifications of anonymized law school data. Technol Sci. 2018;2018111301. Available from: https://techscience.org/a/2018111301/. Cited 2023 Feb 8.

3. Yoo JS, Ra Thaler A, Sweeney L, Zang J. Risks to patient privacy: a re-identification of patients in Maine and Vermont Statewide Hospital data. Technol Sci. 2018;2018100901. Available from: https://techscience.org/a/2018100901/. Cited 2023 Feb 8.

4. Rajotte JF, Bergen R, Buckeridge DL, Emam KE, Ng R, Strome E. Synthetic data as an enabler for machine learning applications in medicine. iScience . 2022;25(11). Available from: https://www.cell.com/iscience/abstract/S2589-0042(22)01603-0. Cited 2023 Feb 16.

5. Mitchell C, Hill ER. Are synthetic health data personal data?. PHG Foundation; 2023. Available from: https://www.phgfoundation.org/report/are-synthetic-health-data-personal-data. Cited 2023 Sept 27.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3