Unfair clause detection in terms of service across multiple languages-Reference-Cited by-同舟云学术

Unfair clause detection in terms of service across multiple languages

Published:2024-04-03 Issue: Volume: Page:
ISSN:0924-8463
Container-title:Artificial Intelligence and Law
language:en
Short-container-title:Artif Intell Law

Author:

Galassi Andrea^ORCID,Lagioia Francesca^ORCID,Jabłonowska Agnieszka^ORCID,Lippi Marco^ORCID

Abstract

AbstractMost of the existing natural language processing systems for legal texts are developed for the English language. Nevertheless, there are several application domains where multiple versions of the same documents are provided in different languages, especially inside the European Union. One notable example is given by Terms of Service (ToS). In this paper, we compare different approaches to the task of detecting potential unfair clauses in ToS across multiple languages. In particular, after developing an annotated corpus and a machine learning classifier for English, we consider and compare several strategies to extend the system to other languages: building a novel corpus and training a novel machine learning system for each language, from scratch; projecting annotations across documents in different languages, to avoid the creation of novel corpora; translating training documents while keeping the original annotations; translating queries at prediction time and relying on the English system only. An extended experimental evaluation conducted on a large, original dataset indicates that the time-consuming task of re-building a novel annotated corpus for each language can often be avoided with no significant degradation in terms of performance.

Funder

Università degli Studi di Firenze

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10506-024-09398-7.pdf

Reference34 articles.

1. Ajani G (2007) Coherence of terminology and search functions. In: 25 years of European Law online: the event: 25 années de Droit européen en ligne: l’événement, Oficina de Publicaciones Oficiales de las Comunidades Europeas, pp 129–136

2. Bender EM (2011) On achieving and evaluating language-independence in NLP. Linguist Issues Lang Technol 6(3):1–28

3. Chalkidis I, Fergadiotis M, Malakasiotis P, Aletras N, Androutsopoulos I (2020) LEGAL-BERT: the muppets straight out of law school. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, pp 2898–2904. https://doi.org/10.18653/v1/2020.findings-emnlp.261. https://aclanthology.org/2020.findings-emnlp.261

4. Chalkidis I, Fergadiotis M, Androutsopoulos I (2021) MultiEURLEX—a multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer. In: Proceedings of the 2021 conference on empirical methods in natural language processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp 6974–6996. https://doi.org/10.18653/v1/2021.emnlp-main.559. https://aclanthology.org/2021.emnlp-main.559

5. Cotterell R, Heigold G (2017) Cross-lingual character-level neural morphological tagging. In: EMNLP, Copenhagen, Denmark, pp 748–759. https://doi.org/10.18653/v1/D17-1078