Abstract
Privacy policies outline data collection and sharing practices followed by an organization, together with choice and control measures available to users to manage the process. However, users have often needed help reading and understanding such documents, regardless of their being written in a natural language. The fundamental problems with privacy policies persist despite advancements in privacy design, frameworks, and regulations. To identify the causes of privacy policies being persistently challenging to comprehend, it is vital to investigate historical policy patterns and understand the evolution of privacy policies concerning information packaging and presentation. To this aid, we create a sentence-level classifier to conduct a large-scale longitudinal analysis on different privacy policies from 130,604 organizations, totaling approximately one million policies from 1997 to 2019. We annotate 10,717 sentences from 115 policies in the OPP-115 corpus to implement the classifier and then use those annotations to train the XLNet and BERT classifiers. Results from our analysis reveal that specific data practice categories experience more frequent policy changes than others, making it challenging to track relevant information over time. In addition, we discover that every category has distinct composition, readability, and structural issues, which exacerbate when categories frequently co-occur in a document. Based on our observations, we provide recommendations for policy articulation and revision to make privacy policy documents conform to better coherence and structure.
Publisher
Privacy Enhancing Technologies Symposium Advisory Board
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献