Rule-Based Information Extraction from Free-Text Pathology Reports Reveals Trends in South African Female Breast Cancer Molecular Subtypes and Ki67 Expression

Author:

Achilonu Okechinyere J.1ORCID,Singh Elvira12ORCID,Nimako Gideon13ORCID,Eijkemans René M. J. C.4ORCID,Musenge Eustasius1ORCID

Affiliation:

1. Division of Epidemiology and Biostatistics, School of Public Health, Faculty of Health Sciences, University of the Witwatersrand, Parktown, Johannesburg, South Africa

2. National Cancer Registry, National Health Laboratory Service, 1 Modderfontein Road, Sandringham, Johannesburg, South Africa

3. Industrialization, Science, Technology and Innovation Hub, African Union Development Agency (AUDA-NEPAD), Johannesburg, South Africa

4. Julius Center for Health Sciences and Primary Care, University Medical Center, Utrecht University, Utrecht, Netherlands

Abstract

Clinical information on molecular subtypes and the Ki67 index is critical for breast cancer (BC) prognosis and personalised treatment plan. Extracting such information into structured data is essential for research, auditing, and cancer incidence reporting and underpins the potential for automated decision support. Herewith, we developed a rule-based natural language processing algorithm that retrieved and extracted important BC parameters from free-text pathology reports towards exploring molecular subtypes and Ki67-proliferation trends. We considered malignant BC pathology reports with different free-text narrative attributes from the South African National Health Laboratory Service. The reports were preprocessed and parsed through the algorithm. Parameters extracted by the algorithm were validated against manually extracted parameters. For all parameters extracted, we obtained accurate annotations of 83-100%, 93-100%, 91-100%, and 92-100% precision, recall, F 1 -score, and kappa, respectively. There was a significant trend in the proportion of each molecular subtype by patient age, histologic type, grade, Ki67, and race. The findings also showed significant association in the Ki67 trend with hormone receptors, human epidermal growth factors, age, grade, and race. Our approach bridges the gap between data availability and actionable knowledge and provides a framework that could be adapted and reused in other cancers and beyond cancer studies. Information extracted from these reports showed interesting trends that may be exploited for BC screening and treatment resources in South Africa. Finally, this study strongly encourages the implementation of a synoptic style pathology report in South Africa.

Funder

UK Government

Publisher

Hindawi Limited

Subject

General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3