Author:
Boedihardjo March,Strohmer Thomas,Vershynin Roman
Abstract
AbstractThe protection of private information is of vital importance in data-driven research, business and government. The conflict between privacy and utility has triggered intensive research in the computer science and statistics communities, who have developed a variety of methods for privacy-preserving data release. Among the main concepts that have emerged are anonymity and differential privacy. Today, another solution is gaining traction, synthetic data. However, the road to privacy is paved with NP-hard problems. In this paper, we focus on the NP-hard challenge to develop a synthetic data generation method that is computationally efficient, comes with provable privacy guarantees and rigorously quantifies data utility. We solve a relaxed version of this problem by studying a fundamental, but a first glance completely unrelated, problem in probability concerning the concept of covariance loss. Namely, we find a nearly optimal and constructive answer to the question how much information is lost when we take conditional expectation. Surprisingly, this excursion into theoretical probability produces mathematical techniques that allow us to derive constructive, approximately optimal solutions to difficult applied problems concerning microaggregation, privacy and synthetic data.
Funder
Swiss Federal Institute of Technology Zurich
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computational Theory and Mathematics,Computational Mathematics,Analysis
Reference37 articles.
1. Adam Meyerson and Ryan Williams. On the complexity of optimal k-anonymity. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 223–228, 2004.
2. Afonso Bandeira, Amit Singer, and Thomas Strohmer. Mathematics of Data Science. https://people.math.ethz.ch/~abandeira/BandeiraSingerStrohmer-MDS-draft.pdf, 2020.
3. Koenraad MR Audenaert. A norm compression inequality for block partitioned positive semidefinite matrices. Linear algebra and its applications, 413(1):155–176, 2006.
4. A. Blum, K. Ligett, and A. Roth, “A learning theory approach to noninteractive database privacy,” Journal of the ACM (JACM), vol. 60, no. 2, pp. 1–25, 2013.
5. Boaz Barak, Kamalika Chaudhuri, Cynthia Dwork, Satyen Kale, Frank McSherry, and Kunal Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 273–282, 2007.
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献