Abstract
The availability of large, deidentified health datasets has enabled significant innovation in using machine learning (ML) to better understand patients and their diseases. However, questions remain regarding the true privacy of this data, patient control over their data, and how we regulate data sharing in a way that that does not encumber progress or further potentiate biases for underrepresented populations. After reviewing the literature on potential reidentifications of patients in publicly available datasets, we argue that the cost—measured in terms of access to future medical innovations and clinical software—of slowing ML progress is too great to limit sharing data through large publicly available databases for concerns of imperfect data anonymization. This cost is especially great for developing countries where the barriers preventing inclusion in such databases will continue to rise, further excluding these populations and increasing existing biases that favor high-income countries. Preventing artificial intelligence’s progress towards precision medicine and sliding back to clinical practice dogma may pose a larger threat than concerns ofpotentialpatient reidentification within publicly available datasets. While the risk to patient privacy should be minimized, we believe this risk will never be zero, and society has to determine an acceptable risk threshold below which data sharing can occur—for the benefit of a global medical knowledge system.
Publisher
Public Library of Science (PLoS)
Reference60 articles.
1. COVID-19 Chest X-Ray Dataset Initiative. Available from: https://github.com/agchung/Figure1-COVID-chestxray-dataset. [cited Mar 2021].
2. Irvin J, Rajpurkar P, Ko M, Yu Y, Ciurea-Ilcus S, Chute C, et al., editors. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. Proceedings of the AAAI Conference on Artificial Intelligence; 2019.
3. MIMIC-IV (version 1.0);A Johnson;PhysioNet,2021
4. MIMIC-III, a freely accessible critical care database.;AEW Johnson;Sci Data,2016
5. Imagine…(a common language for ICU data inquiry and analysis).;LJ Kaplan;Intensive Care Med,2020
Cited by
42 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献