Some examples of privacy-preserving sharing of COVID-19 pandemic data with statistical utility evaluation
-
Published:2023-05-19
Issue:1
Volume:23
Page:
-
ISSN:1471-2288
-
Container-title:BMC Medical Research Methodology
-
language:en
-
Short-container-title:BMC Med Res Methodol
Author:
Liu Fang,Wang Dong,Yan Tian
Abstract
Abstract
Background
A considerable amount of various types of data have been collected during the COVID-19 pandemic, the analysis and understanding of which have been indispensable for curbing the spread of the disease. As the pandemic moves to an endemic state, the data collected during the pandemic will continue to be rich sources for further studying and understanding the impacts of the pandemic on various aspects of our society. On the other hand, naïve release and sharing of the information can be associated with serious privacy concerns.
Methods
We use three common but distinct data types collected during the pandemic (case surveillance tabular data, case location data, and contact tracing networks) to illustrate the publication and sharing of granular information and individual-level pandemic data in a privacy-preserving manner. We leverage and build upon the concept of differential privacy to generate and release privacy-preserving data for each data type. We investigate the inferential utility of privacy-preserving information through simulation studies at different levels of privacy guarantees and demonstrate the approaches in real-life data. All the approaches employed in the study are straightforward to apply.
Results
The empirical studies in all three data cases suggest that privacy-preserving results based on the differentially privately sanitized data can be similar to the original results at a reasonably small privacy loss ($$\epsilon \approx 1$$
ϵ
≈
1
). Statistical inferences based on sanitized data using the multiple synthesis technique also appear valid, with nominal coverage of 95% confidence intervals when there is no noticeable bias in point estimation. When $$\epsilon <1$$
ϵ
<
1
and the sample size is not large enough, some privacy-preserving results are subject to bias, partially due to the bounding applied to sanitized data as a post-processing step to satisfy practical data constraints.
Conclusions
Our study generates statistical evidence on the practical feasibility of sharing pandemic data with privacy guarantees and on how to balance the statistical utility of released information during this process.
Funder
National Science Foundation
University of Notre Dame Asia Research Collaboration Grant
China Scholarships Council program
National Science Foundation of China
Publisher
Springer Science and Business Media LLC
Subject
Health Informatics,Epidemiology
Reference63 articles.
1. $$5$$Lab. COVID-19 News Tracker-Location-based news about COVID-19 in Thailand. 2020. https://covidtracker.5lab.co/zh-hans?fbclid=IwAR1bAH4qDAZtWkdh2MVwAiFmow9lAtRFg78-vPSZKr76__ezADDlBNwYHTyk. Accessed 17 May 2020.
2. Aktay A, Bavadekar S, Cossoul G, Davis J, Desfontaines D, Fabrikant A, et al. Google COVID-19 community mobility reports: Anonymization process description (version 1.0). 2020. arXiv:2004.04145.
3. Andrés ME, Bordenabe NE, Chatzikokolakis K, Palamidessi C. Geo-indistinguishability: differential privacy for location-based systems. In: Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. New York: Association for Computing Machinery; 2013. p. 901–14.
4. Apple, Google. Privacy-Preserving Contact Tracing. 2020. https://covid19.apple.com/contacttracing/. Accessed 23 May 2021.
5. Apple. Apple Differential Privacy Technical Overview. 2020. https://www.apple.com/privacy/docs/Differential_Privacy_Overview.pdf. Accessed 13 June 20221.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献