Automated Classification of Free-text Pathology Reports for Registration of Incident Cases of Cancer

Author:

Defossez G.,Burgun A.,le Beux P.,Levillain P.,Ingrand P.,Claveau V.,Jouhet V.

Abstract

SummaryObjective: Our study aimed to construct and evaluate functions called “classifiers”, produced by supervised machine learning techniques, in order to categorize automatically pathology reports using solely their content.Methods: Patients from the Poitou-Charentes Cancer Registry having at least one pathology report and a single non-metastatic invasive neoplasm were included. A descriptor weighting function accounting for the distribution of terms among targeted classes was developed and compared to classic methods based on inverse document frequencies. The classification was performed with support vector machine (SVM) and Naive Bayes classifiers. Two levels of granularity were tested for both the topographical and the morphological axes of the ICD-O3 code. The ability to correctly attribute a precise ICD-O3 code and the ability to attribute the broad category defined by the International Agency for Research on Cancer (IARC) for the multiple primary cancer registration rules were evaluated using F1-measures.Results: 5121 pathology reports produced by 35 pathologists were selected. The best performance was achieved by our class-weighted descriptor, associated with a SVM classifier. Using this method, the pathology reports were properly classified in the IARC categories with F1-measures of 0.967 for both topography and morphology. The ICD-O3 code attribution had lower performance with a 0.715 F1-measure for topography and 0.854 for morphology.Conclusion: These results suggest that free-text pathology reports could be useful as a data source for automated systems in order to identify and notify new cases of cancer. Future work is needed to evaluate the improvement in performance obtained from the use of natural language processing, including the case of multiple tumor description and possible incorporation of other medical documents such as surgical reports.

Publisher

Georg Thieme Verlag KG

Subject

Health Information Management,Advanced and Specialised Nursing,Health Informatics

Reference39 articles.

1. Bioinformatics and Medical Informatics: Collaborations on the Road to Genomic Medicine?

2. Toward a National Framework for the Secondary Use of Health Data: An American Medical Informatics Association White Paper

3. Murphy SN, Mendis M, Hackett K, Kuttan R, Pan W, Phillips LC, et al. Architecture of the open-source clinical research chart from Informatics for Integrating Biology and the Bedside. AMIA Annu Symp Proc 2007. pp 548-552.

4. Perspectives for Medical Informatics

5. Bioinformatics and Clinical Informatics: The Imperative to Collaborate

Cited by 43 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A Comprehensive Framework for Pathology Classification Bridging Precision and Interpretability;International Journal of Advanced Research in Science, Communication and Technology;2024-07-10

2. Few-shot learning for medical text: A review of advances, trends, and opportunities;Journal of Biomedical Informatics;2023-08

3. Lean Six Sigma: Application of the Methodology in Data Processing for Cancer Registry;International One Health Conference;2023-06-30

4. Development and Validation of an Algorithm to Identify Patients with Advanced Cutaneous Squamous Cell Carcinoma from Pathology Reports;Journal of Investigative Dermatology;2023-01

5. Deep learning to extract Breast Cancer diagnosis concepts;2022 IEEE 35th International Symposium on Computer-Based Medical Systems (CBMS);2022-07

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3