Affiliation:
1. Punjab Engineering College, India
Abstract
In recent years, data collection and data mining have emerged as fast-paced computational processes as the amount of data from different sources has increased manifold. With the advent of such technologies, major concern is exposure of an individual's self-contained information. To confront the unusual situation, anonymization of dataset is performed before being released into public for further usage. The chapter discusses various existing techniques of anonymization. Thereafter, a novel redaction technique is proposed for generalization to minimize the overall cost (penalty) of the process being inversely proportional to utility of generated dataset. To validate the proposed work, authors assume a pre-processed dataset and further compare our algorithm with existing techniques. Lastly, the proposed technique is made scalable thus ensuring further minimization of generalization cost and improving overall utility of information gain.
Reference20 articles.
1. Data Privacy through Optimal k-Anonymization
2. Gantz, J., & Reinsel, D. (2011). Extracting value from chaos. IDC iview, 1-12.
3. Clustering Heuristics for Efficient t-closeness Anonymisation
4. Adaptive Anonymization of Data using b-Edge Cover
5. Laney, D. (2001) 3D data management: Controlling data volume, velocity and variety. META Group Research Note, 1-6.