A Novel Hybrid Approach for Multi-Dimensional Data Anonymization for Apache Spark-Reference-Cited by-同舟云学术

A Novel Hybrid Approach for Multi-Dimensional Data Anonymization for Apache Spark

Published:2022-02-28 Issue:1 Volume:25 Page:1-25
ISSN:2471-2566
Container-title:ACM Transactions on Privacy and Security
language:en
Short-container-title:ACM Trans. Priv. Secur.

Author:

Bazai Sibghat Ullah¹,Jang-Jaccard Julian¹^ORCID,Alavizadeh Hooman¹^ORCID

Affiliation:

1. Massey University, Auckland, New Zealand

Abstract

Multi-dimensional data anonymization approaches (e.g., Mondrian) ensure more fine-grained data privacy by providing a different anonymization strategy applied for each attribute. Many variations of multi-dimensional anonymization have been implemented on different distributed processing platforms (e.g., MapReduce, Spark) to take advantage of their scalability and parallelism supports. According to our critical analysis on overheads, either existing iteration-based or recursion-based approaches do not provide effective mechanisms for creating the optimal number of and relative size of resilient distributed datasets (RDDs), thus heavily suffer from performance overheads. To solve this issue, we propose a novel hybrid approach for effectively implementing a multi-dimensional data anonymization strategy (e.g., Mondrian) that is scalable and provides high-performance. Our hybrid approach provides a mechanism to create far fewer RDDs and smaller size partitions attached to each RDD than existing approaches. This optimal RDD creation and operations approach is critical for many multi-dimensional data anonymization applications that create tremendous execution complexity. The new mechanism in our proposed hybrid approach can dramatically reduce the critical overheads involved in re-computation cost, shuffle operations, message exchange, and cache management.

Publisher

Association for Computing Machinery (ACM)

Subject

Safety, Risk, Reliability and Quality,General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3484945

Reference50 articles.

1. IPUMS International. (2007). Retrieved 25 Sept 2021 from https://international.ipums.org/international/.

2. Sensitivity-Based Anonymization of Big Data

3. DI-Mondrian: Distributed improved Mondrian for satisfaction of the L-diversity privacy model using Apache Spark

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Efficient Associate Rules Mining Based on Topology for Items of Transactional Data;Mathematics;2023-01-12

2. Lightweight and Bilateral Controllable Data Sharing for Secure Autonomous Vehicles Platooning Service;IEEE Transactions on Vehicular Technology;2023

3. Cyber Security's Silver Bullet - A Systematic Literature Review of AI-Powered Security;2022 3rd International Informatics and Software Engineering Conference (IISEC);2022-12-15

4. Toward Privacy Preservation Using Clustering Based Anonymization: Recent Advances and Future Research Outlook;IEEE Access;2022

5. Scalable Distributed Data Anonymization for Large Datasets;IEEE Transactions on Big Data;2022