Validation of Non–Small Cell Lung Cancer Clinical Insights Using a Generalized Oncology Natural Language Processing Model-Reference-Cited by-同舟云学术

Validation of Non–Small Cell Lung Cancer Clinical Insights Using a Generalized Oncology Natural Language Processing Model

Published:2024-09 Issue:8 Volume: Page:
ISSN:2473-4276
Container-title:JCO Clinical Cancer Informatics
language:en
Short-container-title:JCO Clin Cancer Inform

Author:

Kenney Rachel C.¹²^ORCID,Chen Xiaoren¹,Shintani Kazuki¹,Gagnon Clara¹,Liu John¹,DaCosta Byfield Stacey³^ORCID,Ochs Lorre¹^ORCID,Currie Anne-Marie¹^ORCID

Affiliation:

1. Optum Insight, Optum, Eden Prairie, MN

2. Departments of Neurology and Population Health, New York University Grossman School of Medicine, New York, NY

3. Optum Labs, Optum, Eden Prairie, MN

Abstract

PURPOSE Limited studies have used natural language processing (NLP) in the context of non–small cell lung cancer (NSCLC). This study aimed to validate the application of an NLP model to an NSCLC cohort by extracting NSCLC concepts from free-text medical notes and converting them to structured, interpretable data. METHODS Patients with a lung neoplasm, NSCLC histology, and treatment information in their notes were selected from a repository of over 27 million patients. From these, 200 were randomly selected for this study with the longest and the most recent note included for each patient. An NLP model developed and validated on a large solid and blood cancer oncology cohort was applied to this NSCLC cohort. Two certified tumor registrars and a curator abstracted concepts from the notes: neoplasm, histology, stage, TNM values, and metastasis sites. This manually abstracted gold standard was compared with the NLP model output. Precision and recall scores were calculated. RESULTS The NLP model extracted the NSCLC concepts with excellent precision and recall with the following scores, respectively: Lung neoplasm 100% and 100%, NSCLC histology 99% and 88%, histology correctly linked to neoplasm 98% and 79%, stage value 98.8% and 92%, stage TNM value 93% and 98%, and metastasis site 97% and 89%. High precision is related to a low number of false positives, and therefore, extracted concepts are likely accurate. High recall indicates that the model captured most of the desired concepts. CONCLUSION This study validates that Optum's oncology NLP model has high precision and recall with clinical real-world data and is a reliable model to support research studies and clinical trials. This validation study shows that our nonspecific solid tumor and blood cancer oncology model is generalizable to successfully extract clinical information from specific cancer cohorts.

Publisher

American Society of Clinical Oncology (ASCO)

Link

https://ascopubs.org/doi/pdfdirect/10.1200/CCI.23.00099

Reference19 articles.

1. Deep learning in clinical natural language processing: a methodical review

2. Natural Language Processing in Oncology

3. Cross-hospital portability of information extraction of cancer staging information

4. Using automatically extracted information from mammography reports for decision-support

5. Automatic abstraction of imaging observations with their characteristics from mammography reports