Automated extraction of information from free text of Spanish oncology pathology reports-Reference-Cited by-同舟云学术

Automated extraction of information from free text of Spanish oncology pathology reports

Published:2023-08-09 Issue:1 Volume:54 Page:e2035300
ISSN:1657-9534
Container-title:Colombia Medica
language:
Short-container-title:Colomb Med

Author:

Moreno Juan Sebastian^ORCID,Bravo-Ocaña Juan Carlos^ORCID,Riascos Alvaro José^ORCID,Zambrano Angela Regina^ORCID,Mendoza-Urbano Diana Marcela^ORCID,Garcia Johan Felipe^ORCID,Prada Sergio I^ORCID

Abstract

Background:Pathology reports are stored as unstructured, ungrammatical, fragmented, and abbreviated free text with linguistic variability among pathologists. For this reason, tumor information extraction requires a significant human effort. Recording data in an efficient and high-quality format is essential in implementing and establishing a hospital-based cancer registry. Objective:This study aimed to describe implementing a natural language processing algorithm for oncology pathology reports. Methods:An algorithm was developed to process oncology pathology reports in Spanish to extract 20 medical descriptors. The approach is based on the successive coincidence of regular expressions. Results:The validation was performed with 140 pathological reports. The topography identification was performed manually by humans and the algorithm in all reports. The human identified morphology in 138 reports and by the algorithm in 137. The average fuzzy matching score was 68.3 for Topography and 89.5 for Morphology. Conclusion:A preliminary algorithm validation against human extraction was performed over a small set of reports with satisfactory results. This shows that a regular-expression approach can accurately and precisely extract multiple specimen attributes from free-text Spanish pathology reports. Additionally, we developed a website to facilitate collaborative validation at a larger scale which may be helpful for future research on the subject.

Publisher

Universidad del Valle

Reference17 articles.

1. Ruiz A, Facio Á. Hospital-based cancer registry: A tool for patient care, management and quality. A focus on its use for quality assessment. Rev Oncol. 2004; 6(2): 104-13. https://doi.org/10.1007/BF02710038

2. Kreimeyer K, Foster M, Pandey A, Arya N, Halford G, Jones SF, et al. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review. J Biomed Inform. 2017; 73: 14-29. https://doi.org/10.1016/j.jbi.2017.07.012

3. Alawad M, Gao S, Qiu JX, Yoon HJ, Blair Christian J, Penberthy L, et al. Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks. J Am Med Informatics Assoc. 2020; 27(1): 89-98. https://doi.org/10.1093/jamia/ocz153

4. Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc. 2011; 18(5): 544-51. https://doi.org/10.1136/amiajnl-2011-000464

5. Meystre S, Savova G, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inf. 2007; 128-44. https://doi.org/10.1055/s-0038-1638592