PrivacyGLUE: A Benchmark Dataset for General Language Understanding in Privacy Policies

Author:

Shankar Atreya1ORCID,Waldis Andreas1ORCID,Bless Christof1ORCID,Andueza Rodriguez Maria1ORCID,Mazzola Luca1ORCID

Affiliation:

1. Information Systems Research Lab, HSLU—Lucerne University of Applied Sciences and Arts, Suurstoffi 1, CH-6343 Rotkreuz, Switzerland

Abstract

Benchmarks for general language understanding have been rapidly developing in recent years of NLP research, particularly because of their utility in choosing strong-performing models for practical downstream applications. While benchmarks have been proposed in the legal language domain, virtually no such benchmarks exist for privacy policies despite their increasing importance in modern digital life. This could be explained by privacy policies falling under the legal language domain, but we find evidence to the contrary that motivates a separate benchmark for privacy policies. Consequently, we propose PrivacyGLUE as the first comprehensive benchmark of relevant and high-quality privacy tasks for measuring general language understanding in the privacy language domain. Furthermore, we release performances from multiple transformer language models and perform model–pair agreement analysis to detect tasks where models benefited from domain specialization. Our findings show the importance of in-domain pretraining for privacy policies. We believe PrivacyGLUE can accelerate NLP research and improve general language understanding for humans and AI algorithms in the privacy language domain, thus supporting the adoption and acceptance rates of solutions based on it.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Reference40 articles.

1. How to protect privacy in a datafied society? A presentation of multiple legal and conceptual approaches;Gstrein;Philos. Technol.,2022

2. The biggest lie on the internet: Ignoring the privacy policies and terms of service policies of social networking services;Obar;Inform. Commun. Soc.,2020

3. The cost of reading privacy policies;McDonald;ISJLP,2008

4. Wilson, S., Schaub, F., Dara, A.A., Liu, F., Cherivirala, S., Giovanni Leon, P., Schaarup Andersen, M., Zimmeck, S., Sathyendra, K.M., and Russell, N.C. (2016, January 4–9). The Creation and Analysis of a Website Privacy Policy Corpus. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada.

5. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Bowman, S. (2018). Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, 1 November 2018, Association for Computational Linguistics.

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A Review of Advancements and Applications of Pre-Trained Language Models in Cybersecurity;2024 12th International Symposium on Digital Forensics and Security (ISDFS);2024-04-29

2. Natural Language Processing: Bridging the Gap between Human Language and Machine Understanding;2024 International Conference on Trends in Quantum Computing and Emerging Business Technologies;2024-03-22

3. Creation and Analysis of a Natural Language Understanding Dataset for DoD Cybersecurity Policies (CSIAC-DoDIN V1.0);2023 International Conference on Computational Science and Computational Intelligence (CSCI);2023-12-13

4. Understanding Website Privacy Policies—A Longitudinal Analysis Using Natural Language Processing;Information;2023-11-19

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3