Abstract
Abstract
Background
Medical texts such as radiology reports or electronic health records are a powerful source of data for researchers. Anonymization methods must be developed to de-identify documents containing personal information from both patients and medical staff. Although currently there are several anonymization strategies for the English language, they are also language-dependent. Here, we introduce a named entity recognition strategy for Spanish medical texts, translatable to other languages.
Results
We tested 4 neural networks on our radiology reports dataset, achieving a recall of 97.18% of the identifying entities. Alongside, we developed a randomization algorithm to substitute the detected entities with new ones from the same category, making it virtually impossible to differentiate real data from synthetic data. The three best architectures were tested with the MEDDOCAN challenge dataset of electronic health records as an external test, achieving a recall of 69.18%.
Conclusions
The strategy proposed, combining named entity recognition tasks with randomization of entities, is suitable for Spanish radiology reports. It does not require a big training corpus, thus it could be easily extended to other languages and medical texts, such as electronic health records.
Funder
Horizon 2020 Framework Programme
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Health Informatics,Computer Science Applications,Information Systems
Reference41 articles.
1. Hemingway H, Asselbergs FW, Danesh J, Dobson R, Maniadakis N, Maggioni A, van Thiel GJM, Cronin M, Brobert G, Vardas P, Anker SD, Grobbee DE, and SD. Big data from electronic health records for early and late translational cardiovascular research: challenges and potential. Eur Heart J. 2017; 39(16):1481–95. https://doi.org/10.1093/eurheartj/ehx487.
2. Bustos A, Pertusa A, Salinas J-M, de la Iglesia-Vayá M. Padchest: A large chest x-ray image dataset with multi-label annotated reports. Med Image Anal. 2020; 66:101797. https://doi.org/10.1016/j.media.2020.101797.
3. Council of the European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and of the free movement of such data. Off J. 2016; L119:1.
4. Cortes Generales de España. Ley Orgánica 3/2015, de 5 de diciembre, de protección de datos personales y garantía de los derechos digitales. Boletín Oficial del Estado. 2018:A-2018-16673.
5. Dalianis H, Velupillai S. De-identifying Swedish clinical text-refinement of a gold standard and experiments with Conditional random fields. J Biomed Semant. 2010; 1(1):6. https://doi.org/10.1186/2041-1480-1-6.
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献