Author:
Ali Stephen R.,Strafford Huw,Dobbs Thomas D.,Fonferko-Shadrach Beata,Lacey Arron S.,Pickrell William Owen,Hutchings Hayley A.,Whitaker Iain S.
Abstract
IntroductionRoutinely collected healthcare data are a powerful research resource, but often lack detailed disease-specific information that is collected in clinical free text such as histopathology reports. We aim to use natural Language Processing (NLP) techniques to extract detailed clinical and pathological information from histopathology reports to enrich routinely collected data.MethodsWe used the general architecture for text engineering (GATE) framework to build an NLP information extraction system using rule-based techniques. During validation, we deployed our rule-based NLP pipeline on 200 previously unseen, de-identified and pseudonymised basal cell carcinoma (BCC) histopathological reports from Swansea Bay University Health Board, Wales, UK. The results of our algorithm were compared with gold standard human annotation by two independent and blinded expert clinicians involved in skin cancer care.ResultsWe identified 11,224 items of information with a mean precision, recall, and F1 score of 86.0% (95% CI: 75.1–96.9), 84.2% (95% CI: 72.8–96.1), and 84.5% (95% CI: 73.0–95.1), respectively. The difference between clinician annotator F1 scores was 7.9% in comparison with 15.5% between the NLP pipeline and the gold standard corpus. Cohen's Kappa score on annotated tokens was 0.85.ConclusionUsing an NLP rule-based approach for named entity recognition in BCC, we have been able to develop and validate a pipeline with a potential application in improving the quality of cancer registry data, supporting service planning, and enhancing the quality of routinely collected data for research.
Reference33 articles.
1. The association of smoking and socioeconomic status on cutaneous melanoma: a population-based, data-linkage, case–control study;Gibson;Br J Dermatol,2020
2. The association between immunosuppression and skin cancer in solid organ transplant recipients: a control-matched cohort study of 2,852 patients;Gibson;Eur J Dermatol,2021
3. Is poor quality non-melanoma skin cancer data affecting high quality research and patient care?;Ibrahim;J Plast Reconstr Aesthet Surg,2021
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献