Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome-Reference-Cited by-同舟云学术

Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome

Published:2023-03-02 Issue:3 Volume:6 Page:e231204
ISSN:2574-3805
Container-title:JAMA Network Open
language:en
Short-container-title:JAMA Netw Open

Author:

Lee Robert Y.¹²,Kross Erin K.¹²,Torrence Janaki¹²,Li Kevin S.³,Sibley James¹⁴,Cohen Trevor¹³,Lober William B.¹³⁴⁵,Engelberg Ruth A.¹²,Curtis J. Randall¹²⁴⁶

Affiliation:

1. Cambia Palliative Care Center of Excellence at UW Medicine, University of Washington, Seattle

2. Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of Washington, Seattle

3. Division of Biomedical and Health Informatics, Department of Biomedical Informatics and Medical Education, University of Washington, Seattle

4. Department of Biobehavioral Nursing and Health Informatics, University of Washington, Seattle

5. Department of Global Health, University of Washington, Seattle

6. Department of Health Systems and Population Health, University of Washington, Seattle

Abstract

ImportanceMany clinical trial outcomes are documented in free-text electronic health records (EHRs), making manual data collection costly and infeasible at scale. Natural language processing (NLP) is a promising approach for measuring such outcomes efficiently, but ignoring NLP-related misclassification may lead to underpowered studies.ObjectiveTo evaluate the performance, feasibility, and power implications of using NLP to measure the primary outcome of EHR-documented goals-of-care discussions in a pragmatic randomized clinical trial of a communication intervention.Design, Setting, and ParticipantsThis diagnostic study compared the performance, feasibility, and power implications of measuring EHR-documented goals-of-care discussions using 3 approaches: (1) deep-learning NLP, (2) NLP-screened human abstraction (manual verification of NLP-positive records), and (3) conventional manual abstraction. The study included hospitalized patients aged 55 years or older with serious illness enrolled between April 23, 2020, and March 26, 2021, in a pragmatic randomized clinical trial of a communication intervention in a multihospital US academic health system.Main Outcomes and MeasuresMain outcomes were natural language processing performance characteristics, human abstractor-hours, and misclassification-adjusted statistical power of methods of measuring clinician-documented goals-of-care discussions. Performance of NLP was evaluated with receiver operating characteristic (ROC) curves and precision-recall (PR) analyses and examined the effects of misclassification on power using mathematical substitution and Monte Carlo simulation.ResultsA total of 2512 trial participants (mean [SD] age, 71.7 [10.8] years; 1456 [58%] female) amassed 44 324 clinical notes during 30-day follow-up. In a validation sample of 159 participants, deep-learning NLP trained on a separate training data set identified patients with documented goals-of-care discussions with moderate accuracy (maximal F1score, 0.82; area under the ROC curve, 0.924; area under the PR curve, 0.879). Manual abstraction of the outcome from the trial data set would require an estimated 2000 abstractor-hours and would power the trial to detect a risk difference of 5.4% (assuming 33.5% control-arm prevalence, 80% power, and 2-sided α = .05). Measuring the outcome by NLP alone would power the trial to detect a risk difference of 7.6%. Measuring the outcome by NLP-screened human abstraction would require 34.3 abstractor-hours to achieve estimated sensitivity of 92.6% and would power the trial to detect a risk difference of 5.7%. Monte Carlo simulations corroborated misclassification-adjusted power calculations.Conclusions and RelevanceIn this diagnostic study, deep-learning NLP and NLP-screened human abstraction had favorable characteristics for measuring an EHR outcome at scale. Adjusted power calculations accurately quantified power loss from NLP-related misclassification, suggesting that incorporation of this approach into the design of studies using NLP would be beneficial.

Publisher

American Medical Association (AMA)

Subject

General Medicine

Link

https://jamanetwork.com/journals/jamanetworkopen/articlepdf/2801918/lee_2023_oi_230070_1680019614.14324.pdf

Reference74 articles.

1. Natural language processing in oncology: a review.;Yim;JAMA Oncol,2016

2. Deep learning in clinical natural language processing: a methodical review.;Wu;J Am Med Inform Assoc,2020

3. Using electronic health records for quality measurement and accountability in care of the seriously ill: opportunities and challenges.;Curtis;J Palliat Med,2018

4. Natural language processing for EHR-based pharmacovigilance: a structured review.;Luo;Drug Saf,2017

5. Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records.;Bejan;J Am Med Inform Assoc,2018

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Barriers and Strategies to Effective Serious Illness Communication for Patients with End-Stage Liver Disease in the Intensive Care Setting;Journal of Intensive Care Medicine;2024-09-09

2. Large language multimodal models for new-onset type 2 diabetes prediction using five-year cohort electronic health records;Scientific Reports;2024-09-06

3. Harnessing Natural Language Processing to Assess Quality of End-of-Life Care for Children With Cancer;JCO Clinical Cancer Informatics;2024-09

4. Factors Associated with Costly Hospital Care among Patients with Dementia and Acute Respiratory Failure;Annals of the American Thoracic Society;2024-06

5. Bidirectional Encoder Representations from Transformers in Radiology: A Systematic Review of Natural Language Processing Applications;Journal of the American College of Radiology;2024-06