Private measures, random walks, and synthetic data
-
Published:2024-04-20
Issue:1-2
Volume:189
Page:569-611
-
ISSN:0178-8051
-
Container-title:Probability Theory and Related Fields
-
language:en
-
Short-container-title:Probab. Theory Relat. Fields
Author:
Boedihardjo March,Strohmer Thomas,Vershynin Roman
Abstract
AbstractDifferential privacy is a mathematical concept that provides an information-theoretic security guarantee. While differential privacy has emerged as a de facto standard for guaranteeing privacy in data sharing, the known mechanisms to achieve it come with some serious limitations. Utility guarantees are usually provided only for a fixed, a priori specified set of queries. Moreover, there are no utility guarantees for more complex—but very common—machine learning tasks such as clustering or classification. In this paper we overcome some of these limitations. Working with metric privacy, a powerful generalization of differential privacy, we develop a polynomial-time algorithm that creates a private measure from a data set. This private measure allows us to efficiently construct private synthetic data that are accurate for a wide range of statistical analysis tools. Moreover, we prove an asymptotically sharp min-max result for private measures and synthetic data in general compact metric spaces, for any fixed privacy budget $$\varepsilon $$
ε
bounded away from zero. A key ingredient in our construction is a new superregular random walk, whose joint distribution of steps is as regular as that of independent random variables, yet which deviates from the origin logarithmically slowly.
Funder
Directorate for Mathematical and Physical Sciences Division of Computing and Communication Foundations Life Sciences Division, Army Research Office Simons Foundation
Publisher
Springer Science and Business Media LLC
Reference57 articles.
1. Abay, N.C., Zhou, Y., Kantarcioglu, M., Thuraisingham, B., Sweeney, L.: Privacy preserving synthetic data release using deep learning. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 510–526. Springer, Berlin (2018) 2. Abowd, J., Ashmead, R., Simson, G., Kifer, D., Leclerc, P., Machanavajjhala, A., Sexton, W.: Census topdown: differentially private data, incremental schemas, and consistency with public knowledge. US Census Bureau (2019) 3. Abowd, J.M.: The US Census Bureau adopts differential privacy. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 2867 (2018) 4. Abowd, J.M., Woodcock, S.D.: Disclosure limitation in longitudinal linked data. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, p. 215277 (2001) 5. Andrés, M.E., Bordenabe, N.E., Chatzikokolakis, K., Palamidessi, C.: Geo-indistinguishability: differential privacy for location-based systems. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, pp. 901–914 (2013)
|
|