A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs-Reference-Cited by-同舟云学术

A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs

Published:2024-06-14 Issue:4 Volume:29 Page:
ISSN:1382-3256
Container-title:Empirical Software Engineering
language:en
Short-container-title:Empir Software Eng

Author:

Azeem Muhammad Ilyas,Abualhaija Sallam^ORCID

Abstract

AbstractSpecifying legal requirements for software systems to ensure their compliance with the applicable regulations is a major concern of requirements engineering. Personal data which is collected by an organization is often shared with other organizations to perform certain processing activities. In such cases, the General Data Protection Regulation (GDPR) requires issuing a data processing agreement (DPA) which regulates the processing and further ensures that personal data remains protected. Violating GDPR can lead to huge fines reaching to billions of Euros. Software systems involving personal data processing must adhere to the legal obligations stipulated both at a general level in GDPR as well as the obligations outlined in DPAs highlighting specific business. In other words, a DPA is yet another source from which requirements engineers can elicit legal requirements. However, the DPA must be complete according to GDPR to ensure that the elicited requirements cover the complete set of obligations. Therefore, checking the completeness of DPAs is a prerequisite step towards developing a compliant system. Analyzing DPAs with respect to GDPR entirely manually is time consuming and requires adequate legal expertise. In this paper, we propose an automation strategy that addresses the completeness checking of DPAs against GDPR provisions as a text classification problem. Specifically, we pursue ten alternative solutions which are enabled by different technologies, namely traditional machine learning, deep learning, language modeling, and few-shot learning. The goal of our work is to empirically examine how these different technologies fare in the legal domain. We computed F

$$_2$$

2 score on a set of 30 real DPAs. Our evaluation shows that best-performing solutions yield F

$$_2$$

2 score of 86.7% and 89.7% are based on pre-trained BERT and RoBERTa language models. Our analysis further shows that other alternative solutions based on deep learning (e.g., BiLSTM) and few-shot learning (e.g., SetFit) can achieve comparable accuracy, yet are more efficient to develop.

Funder

Fonds National de la Recherche Luxembourg

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10664-024-10491-3.pdf

Reference72 articles.

1. Abualhaija S, Arora C, Sleimi A, Briand LC (2022) Automated question answering for improved understanding of compliance requirements: A multi-document study. In: 2022 IEEE 30th international requirements engineering conference (RE), pp 39–50. IEEE

2. Akhigbe O, Amyot D, Richards G, Lessard L (2022) Gorim: a model-driven method for enhancing regulatory intelligence. Softw Syst Model 21(4):1613–1641

3. Alhoshan W, Ferrari A, Zhao L (2023) Zero-shot learning for requirements classification: An exploratory study. Inf Softw Technol 159:107202

4. Al-Kaswan A, Izadi M, van Deursen A (2023) Stacc: Code comment classification using sentencetransformers. arXiv:2302.13149

5. Amaral O, Abualhaija S, Briand L (2023) Ml-based compliance verification of data processing agreements against gdpr. In: 2023 IEEE 31st international requirements engineering conference (RE), pp 53–64. https://doi.org/10.1109/RE57278.2023.00015

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Rethinking Legal Compliance Automation: Opportunities with Large Language Models;2024 IEEE 32nd International Requirements Engineering Conference (RE);2024-06-24

2. Enhancing Legal Compliance and Regulation Analysis with Large Language Models;2024 IEEE 32nd International Requirements Engineering Conference (RE);2024-06-24