Natural Language Processing to Automatically Extract the Presence and Severity of Esophagitis in Notes of Patients Undergoing Radiotherapy-Reference-Cited by-同舟云学术

Natural Language Processing to Automatically Extract the Presence and Severity of Esophagitis in Notes of Patients Undergoing Radiotherapy

Published:2023-07 Issue:7 Volume: Page:
ISSN:2473-4276
Container-title:JCO Clinical Cancer Informatics
language:en
Short-container-title:JCO Clinical Cancer Informatics

Author:

Chen Shan¹²^ORCID,Guevara Marco¹²,Ramirez Nicolas¹²,Murray Arpi²,Warner Jeremy L.³⁴^ORCID,Aerts Hugo J. W. L.¹²⁵^ORCID,Miller Timothy A.⁶,Savova Guergana K.⁶^ORCID,Mak Raymond H.¹²^ORCID,Bitterman Danielle S.¹²^ORCID

Affiliation:

1. Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA

2. Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA

3. Population Sciences Program, Legorreta Cancer Center, Brown University, Providence, RI

4. Lifespan Cancer Institute, Providence, RI

5. Radiology and Nuclear Medicine, GROW & CARIM, Maastricht University, Maastricht, the Netherlands

6. Computational Health Informatics Program, Boston Children's Hospital, Boston, MA

Abstract

PURPOSE Radiotherapy (RT) toxicities can impair survival and quality of life, yet remain understudied. Real-world evidence holds potential to improve our understanding of toxicities, but toxicity information is often only in clinical notes. We developed natural language processing (NLP) models to identify the presence and severity of esophagitis from notes of patients treated with thoracic RT. METHODS Our corpus consisted of a gold-labeled data set of 1,524 clinical notes from 124 patients with lung cancer treated with RT, manually annotated for Common Terminology Criteria for Adverse Events (CTCAE) v5.0 esophagitis grade, and a silver-labeled data set of 2,420 notes from 1,832 patients from whom toxicity grades had been collected as structured data during clinical care. We fine-tuned statistical and pretrained Bidirectional Encoder Representations from Transformers–based models for three esophagitis classification tasks: task 1, no esophagitis versus grade 1-3; task 2, grade ≤1 versus >1; and task 3, no esophagitis versus grade 1 versus grade 2-3. Transferability was tested on 345 notes from patients with esophageal cancer undergoing RT. RESULTS Fine-tuning of PubMedBERT yielded the best performance. The best macro-F1 was 0.92, 0.82, and 0.74 for tasks 1, 2, and 3, respectively. Selecting the most informative note sections during fine-tuning improved macro-F1 by ≥2% for all tasks. Silver-labeled data improved the macro-F1 by ≥3% across all tasks. For the esophageal cancer notes, the best macro-F1 was 0.73, 0.74, and 0.65 for tasks 1, 2, and 3, respectively, without additional fine-tuning. CONCLUSION To our knowledge, this is the first effort to automatically extract esophagitis toxicity severity according to CTCAE guidelines from clinical notes. This provides proof of concept for NLP-based automated detailed toxicity monitoring in expanded domains.

Publisher

American Society of Clinical Oncology (ASCO)

Subject

General Medicine

Link

https://ascopubs.org/doi/pdfdirect/10.1200/CCI.23.00048

Reference44 articles.

1. Cancer statistics, 2023

2. Why there is a need for pharmacovigilance

3. Analysis and reporting of adverse events in randomised controlled trials: a review

4. Reporting of Safety Results in Published Reports of Randomized Controlled Trials

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Large language models to identify social determinants of health in electronic health records;npj Digital Medicine;2024-01-11