Recognition of text areas with personal data on diagnostic images-Reference-Cited by-同舟云学术

Recognition of text areas with personal data on diagnostic images

Published:2023-07-17 Issue:4 Volume:27 Page:150-158
ISSN:2408-9516
Container-title:Medical Visualization
language:
Short-container-title:Medicinskaâ vizualizaciâ

Author:

Novik V. P.¹^ORCID,Kulberg N. S.²^ORCID,Arzamasov K. M.¹^ORCID,Chetverikov S. F.¹^ORCID,Khoruzhaya A. N.¹^ORCID,Kozlov D. V.¹^ORCID,Kremneva E. I.³^ORCID

Affiliation:

1. Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of Moscow Health Care Department

2. Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences

3. Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of Moscow Health Care Department; Research center of neurology

Abstract

The aim of the study is to develop a method for detecting areas of text with private data on medical diagnostic images using the Tesseract module and the modified Levenshtein distance.Materials and methods. For threshold filtering, the brightness of the points belonging to the text characters in the images is determined at the initial stage. The dynamic threshold is calculated from the histogram of the brightness of the pixels of the image. Next, the Tesseract module is used for primary text recognition. Based on the tag values from DICOM files, a set of strings was formed to search for them in the recognized text. A modified Levenshtein distance was used to search for these strings. A set of DICOM files of the “Dose Report” type was used to test the algorithm. The accuracy was assessed by experts marking up blocks of private information on images.Results. A tool has been developed with a set of metrics and optimal thresholds for choosing decisive rules in finding matches that allow detecting areas of text with private data on medical images. For this tool, the accuracy of localization of areas with personal data on a set of 1131 medical images was determined in comparison with expert markup, which is 99.86%.Conclusion. The tool developed within the framework of this study allows identifying personal data on digital medical images with high accuracy, which indicates the possibility of its practical application in the preparation of data sets.

Publisher

Vidar, Ltd.

Subject

Radiology, Nuclear Medicine and imaging,Radiological and Ultrasound Technology

Reference17 articles.

1. dicomstandard.org [Internet]. Dicom standard: Current Edition [cited 2022 Aug 27]. Available from: https://www.dicomstandard.org/current.

2. Aryanto K.Y.E., Oudkerk M., van Ooijen P.M.A. Free dicom de-identification tools in clinical research: functioning and safety of patient privacy. Eur. Radiol. 2015; 25 (12): 3685–3695. http://doi.org/10.1007/s00330-015-3794-0

3. Daye D., Wiggins W.F., Lungren M.P. et al. Implementation of Clinical Artificial Intelligence in Radiology: Who Decides and How? Special Rep. Radiol. 2022; 305 (1): E62. http://doi.org/10.1148/radiol.229021

4. dclunie.com [Internet]. David Clunie's Medical Image Format Site: Dicomcleaner [cited 2022 Aug 23]. Available from: http://www.dclunie.com.

5. Cook T.S., Zimmerman S.L., Steingall S.R. et al. Radiance: An automated, enterprise-wide solution for archiving and reporting ct radiation dose estimates. Radiographics. 2011; 31 (7): 1833–1846. http://doi.org/10.1148/rg.317115048