Achieving anonymity via clustering-Reference-Cited by-同舟云学术

Achieving anonymity via clustering

Published:2010-06 Issue:3 Volume:6 Page:1-19
ISSN:1549-6325
Container-title:ACM Transactions on Algorithms
language:en
Short-container-title:ACM Trans. Algorithms

Author:

Aggarwal Gagan¹,Panigrahy Rina²,Feder Tomás³,Thomas Dilys⁴,Kenthapadi Krishnaram⁵,Khuller Samir⁶,Zhu An¹

Affiliation:

1. Google Inc., Mountian View, CA

2. Microsoft Research, Mountian View, CA

3. Stanford University, Stanford, CA

4. Oracle, Redwood Shores, CA

5. Microsoft Research, Mountain View, CA

6. University of Maryland, College Park, MD

Abstract

Publishing data for analysis from a table containing personal records, while maintaining individual privacy, is a problem of increasing importance today. The traditional approach of deidentifying records is to remove identifying fields such as social security number, name, etc. However, recent research has shown that a large fraction of the U.S. population can be identified using nonkey attributes (called quasi-identifiers) such as date of birth, gender, and zip code. The k -anonymity model protects privacy via requiring that nonkey attributes that leak information are suppressed or generalized so that, for every record in the modified table, there are at least k −1 other records having exactly the same values for quasi-identifiers. We propose a new method for anonymizing data records, where quasi-identifiers of data records are first clustered and then cluster centers are published. To ensure privacy of the data records, we impose the constraint that each cluster must contain no fewer than a prespecified number of data records. This technique is more general since we have a much larger choice for cluster centers than k -anonymity. In many cases, it lets us release a lot more information without compromising privacy. We also provide constant factor approximation algorithms to come up with such a clustering. This is the first set of algorithms for the anonymization problem where the performance is independent of the anonymity parameter k . We further observe that a few outlier points can significantly increase the cost of anonymization. Hence, we extend our algorithms to allow an ϵ fraction of points to remain unclustered, that is, deleted from the anonymized publication. Thus, by not releasing a small fraction of the database records, we can ensure that the data published for analysis has less distortion and hence is more useful. Our approximation algorithms for new clustering objectives are of independent interest and could be applicable in other clustering scenarios as well.

Funder

Division of Computing and Communication Foundations

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Mathematics (miscellaneous)

Link

https://dl.acm.org/doi/pdf/10.1145/1798596.1798602

Reference17 articles.

1. Aggarwal G. Feder T. Kenthapadi K. Motwani R. Panigrahy R. Thomas D. and Zhu A. 2005. Approximation algorithms for k-anonymity. J. Privacy Technol. Number 20051120001. Aggarwal G. Feder T. Kenthapadi K. Motwani R. Panigrahy R. Thomas D. and Zhu A. 2005. Approximation algorithms for k-anonymity. J. Privacy Technol. Number 20051120001.

2. How to Allocate Network Centers

3. Data Privacy through Optimal k-Anonymization

4. Toward Privacy in Public Databases

Cited by 64 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Lifting in Support of Privacy-Preserving Probabilistic Inference;KI - Künstliche Intelligenz;2024-06-13

2. New algorithms for fair k-center problem with outliers and capacity constraints;Theoretical Computer Science;2024-05

3. Improved Approximation Algorithm for the Distributed Lower-Bounded k-Center Problem;Lecture Notes in Computer Science;2024

4. Variants of Euclidean k-Center Clusterings;Lecture Notes in Computer Science;2023-12-09

5. A Multi-Objective Degree-Based Network Anonymization Method;Algorithms;2023-09-11