Automatic depersonalization of confidential information

Author:

Babak N G.1ORCID,Belorybkin L. Yu.2ORCID,Otsokov S. A.3ORCID,Terenin A. T.2ORCID,Shabrova A. I.2ORCID

Affiliation:

1. National Research University “Moscow Power Engineering Institute”; Sberbank of Russia

2. Sberbank of Russia

3. MIREA – Russian Technological University

Abstract

Objectives. As the scope of personal data transmitted online continues to grow, national legislatures are increasingly regulating the storage and processing of digital information. This paper raises the problem of protecting personal data and other confidential information such as bank secrecy or medical confidentiality of individuals. One approach to the protection of confidential data is to depersonalize it, i.e., to transform it so that it becomes impossible to identify the specific subject to whom the data belongs. The aim of the work is to develop a method for the rapid and safe automation of the depersonalization process using machine learning technologies.Methods. The authors propose the use of artificial intelligence models to implement a system for the automatic depersonalization of personal data without the use of human labor to preclude the possibility of recognizing confidential information even in unstructured data with sufficient accuracy. Rule-based algorithms for improving the precision of the depersonalization system are described.Results. In order to solve this problem, a model of named entity recognition is trained on confidential data provided by the authors. In conjunction with rule-based algorithms, an F1 score greater than 0.9 is achieved. For solving specific depersonalization problems, a choice between several implemented anonymization algorithm variants can be made.Conclusions. The developed system solves the problem of automatic anonymization of confidential data. This opens an opportunity to ensure the secure processing and transmission of confidential information in many areas, such as banking, government administration, and advertising campaigns. The automation of the depersonalization process makes it possible to transfer confidential information in cases where it is necessary, but not currently possible due to legal restrictions. The distinctive feature of the developed solution is that both structured data and unstructured data are depersonalized, including the preservation of context.

Publisher

RTU MIREA

Subject

General Materials Science

Reference24 articles.

1. Shabrova A.I., Terenin A.A., Babak N.G. Methodology for risk assessment from confidential information disclosure in data sources using data mining. Sovremennye informacionnye tehnologii i IT-obrazovanie = Modern Information Technologies and IT-Education. 2022;18(3):666–679 (in Russ.). https://doi.org/10.25559/ SITITO.18.202203.666-679

2. Stolbov A.P. De-identification of personal data in health care. Vrach i informacionnye tekhnologii = Medical Doctor and Information Technologies. 2017;3:76–91 (in Russ.). Available from URL: https://elibrary.ru/zgyvot

3. Spevakov A.G., Kalutskiy I.V., Nikulin D.A., Shumailova V.A. Depersonalization of personal data during processing of information in automated systems. Telekommunikatsii = Telecommunications. 2016;10:16–20 (in Russ.). Available from URL: https:// www.elibrary.ru/wwvxmt

4. Oleksy M., Ropiak N., Walkowiak T. Automated anonymization of text documents in Polish. Procedia Computer Science. 2021;192(1):1323–1333. https://doi. org/10.1016/j.procs.2021.08.136

5. Saluja B., Kumar G., Sedoc J., Callison-Burch C. Anonymization of Sensitive Information in Medical Health Records. In: CEUR Workshop Proceedings. 2019;2421:647–653. Available from URL: https://ceurws.org/Vol-2421/MEDDOCAN_paper_2.pdf

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Email Services Efficiency Improvement Based on Probabilistic Models;2024 Conference of Young Researchers in Electrical and Electronic Engineering (ElCon);2024-01-29

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3