Data Protection and Multi-Database Data-Driven Models-Reference-Cited by-同舟云学术

Data Protection and Multi-Database Data-Driven Models

Published:2023-02-27 Issue:3 Volume:15 Page:93
ISSN:1999-5903
Container-title:Future Internet
language:en
Short-container-title:Future Internet

Author:

Jiang Lili¹^ORCID,Torra Vicenç¹^ORCID

Affiliation:

1. Department of Computing Science, Umeå University, SE-90187 Umeå, Sweden

Abstract

Anonymization and data masking have effects on data-driven models. Different anonymization methods have been developed to provide a good trade-off between privacy guarantees and data utility. Nevertheless, the effects of data protection (e.g., data microaggregation and noise addition) on data integration and on data-driven models (e.g., machine learning models) built from these data are not known. In this paper, we study how data protection affects data integration, and the corresponding effects on the results of machine learning models built from the outcome of the data integration process. The experimental results show that the levels of protection that prevent proper database integration do not affect machine learning models that learn from the integrated database to the same degree. Concretely, our preliminary analysis and experiments show that data protection techniques have a lower level of impact on data integration than on machine learning models.

Funder

Knut and Alice Wallenberg Foundation

Swedish Research Council

Publisher

MDPI AG

Subject

Computer Networks and Communications

Link

https://www.mdpi.com/1999-5903/15/3/93/pdf

Reference33 articles.

1. Cavoukian, A. (2023, February 05). Privacy by Design. The 7 Foundational Principles in Privacy by Design. Strong Privacy Protection—Now, and Well Into the Future. Available online: https://www.ipc.on.ca/wp-content/uploads/Resources/7foundationalprinciples.pdf.

2. Duncan, G.T., Elliot, M., and Salazar, J.J. (2011). Statistical Confidentiality, Springer.

3. Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E.S., Spicer, K., and de Wolf, P.-P. (2012). Statistical Disclosure Control, Wiley.

4. Torra, V. (2022). A Guide to Data Privacy, Springer.

5. Classifying data from protected statistical datasets;Herranz;Comput. Secur.,2010

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Explainable machine learning models with privacy;Progress in Artificial Intelligence;2024-03