Data-Driven Analysis of Privacy Policies Using LexRank and KL Summarizer for Environmental Sustainability
-
Published:2023-03-29
Issue:7
Volume:15
Page:5941
-
ISSN:2071-1050
-
Container-title:Sustainability
-
language:en
-
Short-container-title:Sustainability
Author:
Md Abdul Quadir1, Anand Raghav V.1ORCID, Mohan Senthilkumar2, Joshua Christy Jackson1ORCID, Girish Sabhari S.1ORCID, Devarajan Anthra1, Iwendi Celestine3ORCID
Affiliation:
1. School of Computer Science and Engineering, Vellore Institute of Technology, Chennai 600127, India 2. School of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, India 3. School of Creative Technologies, University of Bolton, Bolton BL3 5AB, UK
Abstract
Natural language processing (NLP) is a field in machine learning that analyses and manipulate huge amounts of data and generates human language. There are a variety of applications of NLP such as sentiment analysis, text summarization, spam filtering, language translation, etc. Since privacy documents are important and legal, they play a vital part in any agreement. These documents are very long, but the important points still have to be read thoroughly. Customers might not have the necessary time or the knowledge to understand all the complexities of a privacy policy document. In this context, this paper proposes an optimal model to summarize the privacy policy in the best possible way. The methodology of text summarization is the process where the summaries from the original huge text are extracted without losing any vital information. Using the proposed idea of a common word reduction process combined with natural language processing algorithms, this paper extracts the sentences in the privacy policy document that hold high weightage and displays them to the customer, and it can save the customer’s time from reading through the entire policy while also providing the customers with only the important lines that they need to know before signing the document. The proposed method uses two different extractive text summarization algorithms, namely LexRank and Kullback Leibler (KL) Summarizer, to summarize the obtained text. According to the results, the summarized sentences obtained via the common word reduction process and text summarization algorithms were more significant than the raw privacy policy text. The introduction of this novel methodology helps to find certain important common words used in a particular sector to a greater depth, thus allowing more in-depth study of a privacy policy. Using the common word reduction process, the sentences were reduced by 14.63%, and by applying extractive NLP algorithms, significant sentences were obtained. The results after applying NLP algorithms showed a 191.52% increase in the repetition of common words in each sentence using the KL summarizer algorithm, while the LexRank algorithm showed a 361.01% increase in the repetition of common words. This implies that common words play a large role in determining a sector’s privacy policies, making our proposed method a real-world solution for environmental sustainability.
Funder
Vellore Institute of Technology
Subject
Management, Monitoring, Policy and Law,Renewable Energy, Sustainability and the Environment,Geography, Planning and Development,Building and Construction
Reference32 articles.
1. Sott, M.K., Nascimento, L.D.S., Foguesatto, C.R., Furstenau, L.B., Faccin, K., Zawislak, P.A., Mellado, B., Kong, J.D., and Bragazzi, N.L. (2021). A Bibliometric Network Analysis of Recent Publications on Digital Agriculture to Depict Strategic Themes and Evolution Structure. Sensors, 21. 2. IoT in healthcare: A scientometric analysis;Belfiore;Technol. Forecast. Soc. Change,2022 3. Gupta, P., Tiwari, R., and Robert, N. (2016, January 6–8). Sentiment analysis and text summarization of online reviews: A survey. Proceedings of the 2016 International Conference on Communication and Signal Processing (ICCSP), Melmaruvathur, India. 4. Gupta, H., and Patel, M. (2020, January 7–9). Study of extractive text summarizer using the elmo embedding. Proceedings of the 4th International Conference on IoT in Social, Mobile, Analytics and Cloud, Palladam, India. 5. A Review of Machine Learning Algorithms for Text-Documents Classification;Baharudin;J. Adv. Inf. Technol.,2010
|
|