PRIVAFRAME: A Frame-Based Knowledge Graph for Sensitive Personal Data-Reference-Cited by-同舟云学术

PRIVAFRAME: A Frame-Based Knowledge Graph for Sensitive Personal Data

Published:2022-08-26 Issue:3 Volume:6 Page:90
ISSN:2504-2289
Container-title:Big Data and Cognitive Computing
language:en
Short-container-title:BDCC

Author:

Gambarelli Gaia^ORCID,Gangemi Aldo^ORCID

Abstract

The pervasiveness of dialogue systems and virtual conversation applications raises an important theme: the potential of sharing sensitive information, and the consequent need for protection. To guarantee the subject’s right to privacy, and avoid the leakage of private content, it is important to treat sensitive information. However, any treatment requires firstly to identify sensitive text, and appropriate techniques to do it automatically. The Sensitive Information Detection (SID) task has been explored in the literature in different domains and languages, but there is no common benchmark. Current approaches are mostly based on artificial neural networks (ANN) or transformers based on them. Our research focuses on identifying categories of personal data in informal English sentences, by adopting a new logical-symbolic approach, and eventually hybridising it with ANN models. We present a frame-based knowledge graph built for personal data categories defined in the Data Privacy Vocabulary (DPV). The knowledge graph is designed through the logical composition of already existing frames, and has been evaluated as background knowledge for a SID system against a labeled sensitive information dataset. The accuracy of PRIVAFRAME reached 78%. By comparison, a transformer-based model achieved 12% lower performance on the same dataset. The top-down logical-symbolic frame-based model allows a granular analysis, and does not require a training dataset. These advantages lead us to use it as a layer in a hybrid model, where the logical SID is combined with an ANNs SID tested in a previous study by the authors.

Publisher

MDPI AG

Subject

Artificial Intelligence,Computer Science Applications,Information Systems,Management Information Systems

Link

https://www.mdpi.com/2504-2289/6/3/90/pdf

Reference46 articles.

1. A Real-World Data Resource of Complex Sensitive Sentences Based on Documents from the Monsanto Trial;Neerbek;Proceedings of the 12th Language Resources and Evaluation Conference,2020

2. EU General Data Protection Regulation (EU-GDPR) https://www.privacy-regulation.eu/en/4.htm

3. Text Classification for Data Loss Prevention

4. Detecting Sensitive Information of Unstructured Text Using Convolutional Neural Network

5. Named Entity Recognition for Sensitive Data Discovery in Portuguese

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Privacy BERT-LSTM: a novel NLP algorithm for sensitive information detection in textual documents;Neural Computing and Applications;2024-05-16

2. Evaluating Ontology-Based PD Monitoring and Alerting in Personal Health Knowledge Graphs and Graph Neural Networks;Information;2024-02-08

3. Is Your Model Sensitive? SPEDAC: A New Resource for the Automatic Classification of Sensitive Personal Data;IEEE Access;2023