Automatic information extraction from childhood cancer pathology reports-Reference-Cited by-同舟云学术

Automatic information extraction from childhood cancer pathology reports

Published:2022-04-06 Issue:2 Volume:5 Page:
ISSN:2574-2531
Container-title:JAMIA Open
language:en
Short-container-title:

Author:

Yoon Hong-Jun¹,Peluso Alina¹,Durbin Eric B²^ORCID,Wu Xiao-Cheng³,Stroup Antoinette⁴,Doherty Jennifer⁵,Schwartz Stephen⁶,Wiggins Charles⁷,Coyle Linda⁸,Penberthy Lynne⁹

Affiliation:

1. Oak Ridge National Laboratory , Oak Ridge, Tennessee, USA

2. College of Medicine, University of Kentucky , Lexington, Kentucky, USA

3. School of Public Health, Louisiana State University Health Sciences Center , New Orleans, Louisiana, USA

4. Rutgers Cancer Institute of New Jersey , New Brunswick, New Jersey, USA

5. Huntsman Cancer Institute, University of Utah , Salt Lake City, Utah, USA

6. Fred Hutchinson Cancer Research Center, Epidemiology Program , Seattle, Washington, USA

7. University of New Mexico , Albuquerque, New Mexico, USA

8. Information Management Services Inc. , Calverton, Maryland, USA

9. National Cancer Institute , Bethesda, Maryland, USA

Abstract

Abstract Objectives The International Classification of Childhood Cancer (ICCC) facilitates the effective classification of a heterogeneous group of cancers in the important pediatric population. However, there has been no development of machine learning models for the ICCC classification. We developed deep learning-based information extraction models from cancer pathology reports based on the ICD-O-3 coding standard. In this article, we describe extending the models to perform ICCC classification. Materials and Methods We developed 2 models, ICD-O-3 classification and ICCC recoding (Model 1) and direct ICCC classification (Model 2), and 4 scenarios subject to the training sample size. We evaluated these models with a corpus consisting of 29 206 reports with age at diagnosis between 0 and 19 from 6 state cancer registries. Results Our findings suggest that the direct ICCC classification (Model 2) is substantially better than reusing the ICD-O-3 classification model (Model 1). Applying the uncertainty quantification mechanism to assess the confidence of the algorithm in assigning a code demonstrated that the model achieved a micro-F1 score of 0.987 while abstaining (not sufficiently confident to assign a code) on only 14.8% of ambiguous pathology reports. Conclusions Our experimental results suggest that the machine learning-based automatic information extraction from childhood cancer pathology reports in the ICCC is a reliable means of supplementing human annotators at state cancer registries by reading and abstracting the majority of the childhood cancer pathology reports accurately and reliably.

Funder

National Institutes of Health

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Link

https://academic.oup.com/jamiaopen/article-pdf/5/2/ooac049/44109607/ooac049.pdf

Reference20 articles.

1. Cancer statistics, 2021;Siegel;CA Cancer J Clin,2021

2. Childhood and adolescent cancer statistics, 2014;Ward;CA Cancer J Clin,2014

3. International incidence of childhood cancer, 2001–10: a population-based registry study;Steliarova-Foucher;Lancet Oncol,2017