Abstract
Abstract
CDC WONDER is a web-based tool for the dissemination of epidemiologic data collected by the National Vital Statistics System. While CDC WONDER has built-in privacy protections, they do not satisfy formal privacy protections such as differential privacy and thus are susceptible to targeted attacks. Given the importance of making high-quality public health data publicly available while preserving the privacy of the underlying data subjects, we aim to improve the utility of a recently developed approach for generating Poisson-distributed, differentially private synthetic data by using publicly available information to truncate the range of the synthetic data. Specifically, we utilize county-level population information from the US Census Bureau and national death reports produced by the CDC to inform prior distributions on county-level death rates and infer reasonable ranges for Poisson-distributed, county-level death counts. In doing so, the requirements for satisfying differential privacy for a given privacy budget can be reduced by several orders of magnitude, thereby leading to substantial improvements in utility. To illustrate our proposed approach, we consider a dataset comprised of over 26,000 cancer-related deaths from the Commonwealth of Pennsylvania belonging to over 47,000 combinations of cause-of-death and demographic variables such as age, race, sex, and county-of-residence and demonstrate the proposed framework’s ability to preserve features such as geographic, urban/rural, and racial disparities present in the true data.
Funder
National Science Foundation
Publisher
Oxford University Press (OUP)
Subject
Applied Mathematics,Statistics, Probability and Uncertainty,Social Sciences (miscellaneous),Statistics and Probability
Reference24 articles.
1. Bayesian Image Restoration, with Two Applications in Spatial Statistics;Besag;Annals of the Institute of Statistical Mathematics,1991
2. The Natural Variability of Vital Rates and Associated Statistics;Brillinger;Biometrics,1986
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. 30 Years of Synthetic Data;Statistical Science;2024-05-01
2. Differential Privacy for Government Agencies—Are We There Yet?;Journal of the American Statistical Association;2023-01-02