Automated Extraction and Presentation of Data Practices in Privacy Policies-Reference-Cited by-同舟云学术

Automated Extraction and Presentation of Data Practices in Privacy Policies

Published:2021-01-29 Issue:2 Volume:2021 Page:88-110
ISSN:2299-0984
Container-title:Proceedings on Privacy Enhancing Technologies
language:en
Short-container-title:

Author:

Bui Duc¹,Shin Kang G.¹,Choi Jong-Min²,Shin Junbum²

Affiliation:

1. University of Michigan

2. Samsung Research

Abstract

Abstract Privacy policies are documents required by law and regulations that notify users of the collection, use, and sharing of their personal information on services or applications. While the extraction of personal data objects and their usage thereon is one of the fundamental steps in their automated analysis, it remains challenging due to the complex policy statements written in legal (vague) language. Prior work is limited by small/generated datasets and manually created rules. We formulate the extraction of fine-grained personal data phrases and the corresponding data collection or sharing practices as a sequence-labeling problem that can be solved by an entity-recognition model. We create a large dataset with 4.1k sentences (97k tokens) and 2.6k annotated fine-grained data practices from 30 real-world privacy policies to train and evaluate neural networks. We present a fully automated system, called PI-Extract, which accurately extracts privacy practices by a neural model and outperforms, by a large margin, strong rule-based baselines. We conduct a user study on the effects of data practice annotation which highlights and describes the data practices extracted by PI-Extract to help users better understand privacy-policy documents. Our experimental evaluation results show that the annotation significantly improves the users’ reading comprehension of policy texts, as indicated by a 26.6% increase in the average total reading score.

Publisher

Walter de Gruyter GmbH

Subject

General Medicine

Link

https://www.sciendo.com/pdf/10.2478/popets-2021-0019

Reference75 articles.

1. [1] United States Federal Trade Commission. Privacy online: a report to Congress. The Commission, 1998.

2. [2] OECD, OCDE. The oecd principles of corporate governance. Contaduría y Administración, (216), 2004.

3. [3] European Parliament and Council of the European Union. General data protection regulation. page 88, 2016.

4. [4] Aleecia McDonald and Lorrie Faith Cranor. Beliefs and Behaviors: Internet Users’ Understanding of Behavioral Advertising. SSRN Scholarly Paper ID 1989092, Social Science Research Network, Rochester, NY, August 2010.

5. [5] Ashwini Rao, Florian Schaub, Norman Sadeh, Alessandro Acquisti, and Ruogu Kang. Expecting the Unexpected: Understanding Mismatched Privacy Expectations Online. pages 77–96, 2016.

Cited by 21 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Measuring privacy policy compliance in the Alexa ecosystem: In-depth analysis;Computers & Security;2024-09

2. Evaluating Quantized Llama 2 Models for IoT Privacy Policy Language Generation;Future Internet;2024-06-26

3. VioDroid-Finder: automated evaluation of compliance and consistency for Android apps;Empirical Software Engineering;2024-05

4. A Large Language Model Approach to Code and Privacy Policy Alignment;2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER);2024-03-12

5. From Privacy Policies to Privacy Threats: A Case Study in Policy-Based Threat Modeling;Proceedings of the 22nd Workshop on Privacy in the Electronic Society;2023-11-26