Rounding based continuous data discretization for statistical disclosure control-Reference-Cited by-同舟云学术

Rounding based continuous data discretization for statistical disclosure control

Published:2019-09-25 Issue: Volume: Page:
ISSN:1868-5137
Container-title:Journal of Ambient Intelligence and Humanized Computing
language:en
Short-container-title:J Ambient Intell Human Comput

Author:

Senavirathne Navoda^ORCID,Torra Vicenç

Abstract

Abstract “Rounding” can be understood as a way to coarsen continuous data. That is, low level and infrequent values are replaced by high-level and more frequent representative values. This concept is explored as a method for data privacy with techniques like rounding, microaggregation, and generalisation. This concept is explored as a method for data privacy in statistical disclosure control literature with perturbative techniques like rounding, microaggregation and non-perturbative methods like generalisation. Even though “rounding” is well known as a numerical data protection method, it has not been studied in depth or evaluated empirically to the best of our knowledge. This work is motivated by three objectives, (1) to study the alternative methods of obtaining the rounding values to represent a given continuous variable, (2) to empirically evaluate rounding as a data protection technique based on information loss (IL) and disclosure risk (DR), and (3) to analyse the impact of data rounding on machine learning based models. Here, in order to obtain the rounding values we consider discretization methods introduced in the unsupervised machine learning literature along with microaggregation and re-sampling based approaches. The results indicate that microaggregation based techniques are preferred over unsupervised discretization methods due to their fair trade-off between IL and DR.

Funder

Vetenskapsrådet

Publisher

Springer Science and Business Media LLC

Subject

General Computer Science

Link

http://link.springer.com/content/pdf/10.1007/s12652-019-01489-7.pdf

Reference26 articles.

1. Abidi B, Ben Yahia S, Perera C (2018) Hybrid microaggregation for privacy preserving data mining. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-018-1122-7

2. Bennasar M, Setchi R, Hicks Y (2012) Unsupervised discretization method based on adjustable intervals. In: Advances in knowledge-based and intelligent information and engineering systems - 16th annual KES conference, San Sebastian, Spain, 10–12 September 2012, pp 79–87. https://doi.org/10.3233/978-1-61499-105-2-79

3. Chettri SK, Paul B, Dutta AK (2012) A comparative study on microaggregation techniques for microdata protection. Int J Data Min Knowl Manag Process 2(6):27