Affiliation:
1. Department of Computing Science, Umeå University, SE-90187 Umeå, Sweden
Abstract
Anonymization and data masking have effects on data-driven models. Different anonymization methods have been developed to provide a good trade-off between privacy guarantees and data utility. Nevertheless, the effects of data protection (e.g., data microaggregation and noise addition) on data integration and on data-driven models (e.g., machine learning models) built from these data are not known. In this paper, we study how data protection affects data integration, and the corresponding effects on the results of machine learning models built from the outcome of the data integration process. The experimental results show that the levels of protection that prevent proper database integration do not affect machine learning models that learn from the integrated database to the same degree. Concretely, our preliminary analysis and experiments show that data protection techniques have a lower level of impact on data integration than on machine learning models.
Funder
Knut and Alice Wallenberg Foundation
Swedish Research Council
Subject
Computer Networks and Communications
Reference33 articles.
1. Cavoukian, A. (2023, February 05). Privacy by Design. The 7 Foundational Principles in Privacy by Design. Strong Privacy Protection—Now, and Well Into the Future. Available online: https://www.ipc.on.ca/wp-content/uploads/Resources/7foundationalprinciples.pdf.
2. Duncan, G.T., Elliot, M., and Salazar, J.J. (2011). Statistical Confidentiality, Springer.
3. Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E.S., Spicer, K., and de Wolf, P.-P. (2012). Statistical Disclosure Control, Wiley.
4. Torra, V. (2022). A Guide to Data Privacy, Springer.
5. Classifying data from protected statistical datasets;Herranz;Comput. Secur.,2010
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献