Affiliation:
1. Law School, University of Exeter, Exeter, UK
Abstract
Synthetic data generated through machine learning algorithms from original real-world data is gaining prominence across sectors due to their potential to provide privacy-preserving alternatives to traditional data sources. However, recent studies have raised concerns about the re-identification risks of synthetic data. This article examines the legal challenges surrounding synthetic data protection, with a focus on the European Union's General Data Protection Regulation (GDPR). After briefly explaining the methods of synthetic data generation and discussing their potential for privacy preservation, the article analyses the shortcomings of the personal/non-personal dualist approach under the GDPR. It then assesses the possibility of a paradigm change in data protection legislation, moving beyond this binary categorisation. The article argues in favour of establishing clear guidelines for the generation and processing of synthetic data, prioritising the principles of transparency, accountability and fairness.
Reference17 articles.
1. Regulation 2023/2854 of the European Parliament and of the Council of 13 December 2023 on harmonised rules on fair access to and use of data and amending Regulation (EU) 2017/2394 and Directive (EU) 2020/1828 (Data Act) OJ L 2023/2854.
2. Arnold C, Neunhoeffer M (2020) Really useful synthetic data – a framework to evaluate the quality of differentially private synthetic data. In: 37th international conference on machine learning, Vienna. DOI: https://doi.org/10.48550/arXiv.2004.07740.
3. Synthetic patient data in health care: a widening legal loophole
4. Synthetic data in machine learning for medicine and healthcare