Utility-aware Privacy Perturbation for Training Data-Reference-Cited by-同舟云学术

Utility-aware Privacy Perturbation for Training Data

Published:2024-02-13 Issue:4 Volume:18 Page:1-21
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Li Xinjiao¹^ORCID,Wu Guowei¹^ORCID,Yao Lin²^ORCID,Zheng Zhaolong¹^ORCID,Geng Shisong³^ORCID

Affiliation:

1. School of Software, Dalian University of Technology, China

2. DUT-RU International School of Information Science & Engineering, Dalian University of Technology, China

3. Institute of Software, Chinese Academy of Science, China

Abstract

Data perturbation under differential privacy constraint is an important approach of protecting data privacy. However, as the data dimensions increase, the privacy budget allocated to each dimension decreases and thus the amount of noise added increases, which eventually leads to lower data utility in training tasks. To protect the privacy of training data while enhancing data utility, we propose a Utility-aware training data Privacy Perturbation scheme based on attribute Partition and budget Allocation (UPPPA). UPPPA includes three procedures: the quantification of attribute privacy and attribute importance, attribute partition, and budget allocation. The quantification of attribute privacy and attribute importance based on information entropy and attribute correlation provide an arithmetic basis for attribute partition and budget allocation. During the attribute partition, all attributes of training data are classified into high and low classes to achieve privacy amplification and utility enhancement. During the budget allocation, a γ-privacy model is proposed to balance data privacy and data utility so as to provide privacy constraint and guide budget allocation. Three comprehensive sets of real-world data are applied to evaluate the performance of UPPPA. Experiments and privacy analysis show that our scheme can achieve the tradeoff between privacy and utility.

Funder

National Natural Science Foundation of China

Research Foundation of the Key Laboratory of Spaceborne Information Intelligent Interpretation

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3639411

Reference33 articles.

1. Towards the science of security and privacy in machine learning;Papernot Nicolas;arXiv preprint arXiv:1611.03814,2016

2. Progress and future challenges of security attacks and defense mechanisms in machine learning;Li Xinjiao;Journal of Software,2021

3. Data privacy and trustworthy machine learning;Strobel Martin;IEEE Security & Privacy,2022

4. Ji Liu Jizhou Huang Yang Zhou Xuhong Li Shilei Ji Haoyi Xiong and Dejing Dou. 2022. From distributed machine learning to federated learning: A survey. Knowledge and Information Systems 64 4 (2022) 885-917.

5. Privacy-Preserving Machine Learning Using Cryptography