The Health Gym: synthetic health-related datasets for the development of reinforcement learning algorithms-Reference-Cited by-同舟云学术

The Health Gym: synthetic health-related datasets for the development of reinforcement learning algorithms

Published:2022-11-11 Issue:1 Volume:9 Page:
ISSN:2052-4463
Container-title:Scientific Data
language:en
Short-container-title:Sci Data

Author:

Kuo Nicholas I-Hsien,Polizzotto Mark N.,Finfer Simon,Garcia Federico^ORCID,Sönnerborg Anders,Zazzi Maurizio,Böhm Michael,Kaiser Rolf,Jorm Louisa,Barbieri Sebastiano^ORCID

Abstract

AbstractIn recent years, the machine learning research community has benefited tremendously from the availability of openly accessible benchmark datasets. Clinical data are usually not openly available due to their confidential nature. This has hampered the development of reproducible and generalisable machine learning applications in health care. Here we introduce the Health Gym - a growing collection of highly realistic synthetic medical datasets that can be freely accessed to prototype, evaluate, and compare machine learning algorithms, with a specific focus on reinforcement learning. The three synthetic datasets described in this paper present patient cohorts with acute hypotension and sepsis in the intensive care unit, and people with human immunodeficiency virus (HIV) receiving antiretroviral therapy. The datasets were created using a novel generative adversarial network (GAN). The distributions of variables, and correlations between variables and trends in variables over time in the synthetic datasets mirror those in the real datasets. Furthermore, the risk of sensitive information disclosure associated with the public distribution of the synthetic datasets is estimated to be very low.

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability

Link

https://www.nature.com/articles/s41597-022-01784-7.pdf

Reference97 articles.

1. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press 2018).

2. Mnih, V. et al. Playing atari with deep reinforcement learning. Preprint at https://arxiv.org/abs/1312.5602 (2013).

3. Silver, D. et al. Mastering the game of go with deep neural networks and tree search. Nature 529, 484–489 (2016).

4. Brockman, G. et al. OpenAI gym. Preprint at https://arxiv.org/abs/1606.01540 (2016).

5. Beattie, C. et al. DeepMind lab. Preprint at https://arxiv.org/abs/1612.03801 (2016).

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comparative assessment of synthetic time series generation approaches in healthcare: leveraging patient metadata for accurate data synthesis;BMC Medical Informatics and Decision Making;2024-01-30

2. Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project;JMIR Medical Education;2024-01-16

3. The Secondary Isolated Data Island: Isolated Data Island Caused by Blockchain in Federated Learning;2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM);2023-12-05

4. A Privacy Nihilist’s Perspective on Clinical Data Sharing: Open Clinical Data Sharing is Dead, Long Live the Walled Garden;Journal of the Society for Clinical Data Management;2023-11-08

5. Generative AI Mitigates Representation Bias Using Synthetic Health Data;2023-09-27