Data-Driven Analysis of Privacy Policies Using LexRank and KL Summarizer for Environmental Sustainability-Reference-Cited by-同舟云学术

Data-Driven Analysis of Privacy Policies Using LexRank and KL Summarizer for Environmental Sustainability

Published:2023-03-29 Issue:7 Volume:15 Page:5941
ISSN:2071-1050
Container-title:Sustainability
language:en
Short-container-title:Sustainability

Author:

Md Abdul Quadir¹,Anand Raghav V.¹^ORCID,Mohan Senthilkumar²,Joshua Christy Jackson¹^ORCID,Girish Sabhari S.¹^ORCID,Devarajan Anthra¹,Iwendi Celestine³^ORCID

Affiliation:

1. School of Computer Science and Engineering, Vellore Institute of Technology, Chennai 600127, India

2. School of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, India

3. School of Creative Technologies, University of Bolton, Bolton BL3 5AB, UK

Abstract

Natural language processing (NLP) is a field in machine learning that analyses and manipulate huge amounts of data and generates human language. There are a variety of applications of NLP such as sentiment analysis, text summarization, spam filtering, language translation, etc. Since privacy documents are important and legal, they play a vital part in any agreement. These documents are very long, but the important points still have to be read thoroughly. Customers might not have the necessary time or the knowledge to understand all the complexities of a privacy policy document. In this context, this paper proposes an optimal model to summarize the privacy policy in the best possible way. The methodology of text summarization is the process where the summaries from the original huge text are extracted without losing any vital information. Using the proposed idea of a common word reduction process combined with natural language processing algorithms, this paper extracts the sentences in the privacy policy document that hold high weightage and displays them to the customer, and it can save the customer’s time from reading through the entire policy while also providing the customers with only the important lines that they need to know before signing the document. The proposed method uses two different extractive text summarization algorithms, namely LexRank and Kullback Leibler (KL) Summarizer, to summarize the obtained text. According to the results, the summarized sentences obtained via the common word reduction process and text summarization algorithms were more significant than the raw privacy policy text. The introduction of this novel methodology helps to find certain important common words used in a particular sector to a greater depth, thus allowing more in-depth study of a privacy policy. Using the common word reduction process, the sentences were reduced by 14.63%, and by applying extractive NLP algorithms, significant sentences were obtained. The results after applying NLP algorithms showed a 191.52% increase in the repetition of common words in each sentence using the KL summarizer algorithm, while the LexRank algorithm showed a 361.01% increase in the repetition of common words. This implies that common words play a large role in determining a sector’s privacy policies, making our proposed method a real-world solution for environmental sustainability.

Funder

Vellore Institute of Technology

Publisher

MDPI AG

Subject

Management, Monitoring, Policy and Law,Renewable Energy, Sustainability and the Environment,Geography, Planning and Development,Building and Construction

Link

https://www.mdpi.com/2071-1050/15/7/5941/pdf

Reference32 articles.

1. Sott, M.K., Nascimento, L.D.S., Foguesatto, C.R., Furstenau, L.B., Faccin, K., Zawislak, P.A., Mellado, B., Kong, J.D., and Bragazzi, N.L. (2021). A Bibliometric Network Analysis of Recent Publications on Digital Agriculture to Depict Strategic Themes and Evolution Structure. Sensors, 21.

2. IoT in healthcare: A scientometric analysis;Belfiore;Technol. Forecast. Soc. Change,2022

3. Gupta, P., Tiwari, R., and Robert, N. (2016, January 6–8). Sentiment analysis and text summarization of online reviews: A survey. Proceedings of the 2016 International Conference on Communication and Signal Processing (ICCSP), Melmaruvathur, India.

4. Gupta, H., and Patel, M. (2020, January 7–9). Study of extractive text summarizer using the elmo embedding. Proceedings of the 4th International Conference on IoT in Social, Mobile, Analytics and Cloud, Palladam, India.

5. A Review of Machine Learning Algorithms for Text-Documents Classification;Baharudin;J. Adv. Inf. Technol.,2010