A Histogram Publishing Method under Differential Privacy That Involves Balancing Small-Bin Availability First
-
Published:2024-07-04
Issue:7
Volume:17
Page:293
-
ISSN:1999-4893
-
Container-title:Algorithms
-
language:en
-
Short-container-title:Algorithms
Author:
Chen Jianzhang1, Zhou Shuo1, Qiu Jie1, Xu Yixin1, Zeng Bozhe1, Fang Wanchuan1, Chen Xiangying1, Huang Yipeng1, Xu Zhengquan2, Chen Youqin23
Affiliation:
1. College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China 2. State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China 3. College of Computer Science and Mathematics, Fujian University of Technology, Fuzhou 350118, China
Abstract
Differential privacy, a cornerstone of privacy-preserving techniques, plays an indispensable role in ensuring the secure handling and sharing of sensitive data analysis across domains such as in census, healthcare, and social networks. Histograms, serving as a visually compelling tool for presenting analytical outcomes, are widely employed in these sectors. Currently, numerous algorithms for publishing histograms under differential privacy have been developed, striving to balance privacy protection with the provision of useful data. Nonetheless, the pivotal challenge concerning the effective enhancement of precision for small bins (those intervals that are narrowly defined or contain a relatively small number of data points) within histograms has yet to receive adequate attention and in-depth investigation from experts. In standard DP histogram publishing, adding noise without regard for bin size can result in small data bins being disproportionately influenced by noise, potentially severely impairing the overall accuracy of the histogram. In response to this challenge, this paper introduces the SReB_GCA sanitization algorithm designed to enhance the accuracy of small bins in DP histograms. The SReB_GCA approach involves sorting the bins from smallest to largest and applying a greedy grouping strategy, with a predefined lower bound on the mean relative error required for a bin to be included in a group. Our theoretical analysis reveals that sorting bins in ascending order prior to grouping effectively prioritizes the accuracy of smaller bins. SReB_GCA ensures strict ϵ-DP compliance and strikes a careful balance between reconstruction error and noise error, thereby not only initially improving the accuracy of small bins but also approximately optimizing the mean relative error of the entire histogram. To validate the efficiency of our proposed SReB_GCA method, we conducted extensive experiments using four diverse datasets, including two real-life datasets and two synthetic ones. The experimental results, quantified by the Kullback–Leibler Divergence (KLD), show that the SReB_GCA algorithm achieves substantial performance enhancement compared to the baseline method (DP_BASE) and several other established approaches for differential privacy histogram publication.
Funder
Natural Science Foundation of China Natural Science Foundation of Fujian Province, China Science and Technology Innovation Special Fund of Fujian Agriculture and Forestry University Undergraduate Innovation and Entrepreneurship Training Program of Fujian Agricultural and Forestry University Research Fund of Fujian University of Technology
Reference36 articles.
1. Torra, V. (2017). Data Privacy: Foundations, New Developments and the Big Data Challenge, Springer Press. 2. Privacy-preserving data publishing: A survey of recent developments;Fung;ACM Comput. Surv.,2010 3. K-anonymity: A model for protecting privacy;Sweeney;Int. J. Uncertain. Fuzz. Knowl. Syst.,2002 4. Dwork, C. (2008, January 25–29). Differential privacy: A survey of results. Proceedings of the International Conference on Theory and Applications of Models of Computation, Xi’an, China. 5. Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006, January 4–7). Calibrating noise to sensitivity in private data analysis. Proceedings of the 3rd Conference on Theory of Cryptography Conference, New York, NY, USA.
|
|