Geometrically-aggregated training samples: Leveraging summary statistics to enable healthcare data democratization-Reference-Cited by-同舟云学术

Geometrically-aggregated training samples: Leveraging summary statistics to enable healthcare data democratization

Published:2023-10-25 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Yang Jenny^ORCID,Thakur Anshul,Soltan Andrew A. S.,Clifton David A.

Abstract

AbstractHealthcare data is highly sensitive and confidential, with strict regulations and laws to protect patient privacy and security. However, these regulations impede the access of healthcare data to a wider AI research community. As a result, AI healthcare research is often dominated by organisations with access to larger datasets or limited to silo-based development, where models are trained and evaluated on a limited population. Taking inspiration from the non-sensitive nature of the summary statistics (mean, variance, etc.) of healthcare data, this paper proposesgeometrically-aggregated training samples (GATS)where each training sample is a convex combination of multiple patients’ characteristics. Thus, mappings from patients to any constructed sample are highly convoluted, preserving patient privacy. We demonstrate that these “summary training units” provide effective training on different tabular and time-series datasets (CURIAL, UCI Adult, and eICU), and indeed behave as a summary of the original training datasets. This approach takes important steps towards data accessibility and democratization.

Publisher

Cold Spring Harbor Laboratory

Reference39 articles.

1. El Emam, K. , Rodgers, S. , & Malin, B. (2015). Anonymising and sharing individual patient data. bmj, 350.

2. Re-identification attacks—A systematic literature review;International Journal of Information Management,2016

3. Lost in anonymization—A data anonymization reference classification merging legal and technical considerations;Journal of Law, Medicine & Ethics,2020

4. Zhang, C. , Kuppannagari, S. R. , Kannan, R. , & Prasanna, V. K. (2018, October). Generative adversarial network for synthetic time series data generation in smart grids. In 2018 IEEE international conference on communications, control, and computing technologies for smart grids (SmartGridComm) (pp. 1–6). IEEE.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Generalizability Assessment of AI Models Across Hospitals: A Comparative Study in Low-Middle Income and High Income Countries;2023-11-06