Natural Language Processing–Assisted Classification Models to Confirm Monoclonal Gammopathy of Undetermined Significance and Progression in Veterans' Electronic Health Records-Reference-Cited by-同舟云学术

Natural Language Processing–Assisted Classification Models to Confirm Monoclonal Gammopathy of Undetermined Significance and Progression in Veterans' Electronic Health Records

Published:2023-09 Issue:7 Volume: Page:
ISSN:2473-4276
Container-title:JCO Clinical Cancer Informatics
language:en
Short-container-title:JCO Clin Cancer Inform

Author:

Wang Mei¹²^ORCID,Yu Yao-Chi¹³,Liu Lawrence¹⁴^ORCID,Schoen Martin W.¹⁵^ORCID,Kumar Akhil¹²,Vargo Kristin¹,Colditz Graham²^ORCID,Thomas Theodore¹⁶,Chang Su-Hsin¹²^ORCID

Affiliation:

1. Research Service, St Louis Veterans Affairs Medical Center, St Louis, MO

2. Department of Surgery, Washington University School of Medicine, St Louis, MO

3. Department of Electrical and Systems Engineering, Washington University in St Louis, St Louis, MO

4. City of Hope National Comprehensive Cancer Center, Duarte, CA

5. Department of Medicine, Saint Louis University School of Medicine, St Louis, MO

6. Department of Medicine, Washington University School of Medicine, St Louis, MO

Abstract

PURPOSE To develop and validate natural language processing (NLP)–assisted machine learning (ML)–based classification models to confirm diagnoses of monoclonal gammopathy of undetermined significance (MGUS) and multiple myeloma (MM) from electronic health records (EHRs) in the Veterans Health Administration (VHA). MATERIALS AND METHODS We developed precompiled lexicons and classification rules as features for the following ML classifiers: logistic regression, random forest, and support vector machines (SVMs). These features were trained on 36,044 EHR documents from a random sample of 400 patients with at least one International Classification of Disease code for MGUS diagnosis from 1999 to 2021. The best-performing feature combination was calibrated in the validation set (17,826 documents/200 patients) and evaluated in the testing set (9,250 documents/100 patients). Model performance in diagnosis confirmation was compared with manual chart review results (gold standard) using recall, precision, accuracy, and F1 score. For patients correctly labeled as disease-positive, the difference between model-identified diagnosis dates and the gold standard was also computed. RESULTS In the testing set, the NLP-assisted classification model using SVMs achieved best performance in both MGUS and MM confirmation with recall/precision/accuracy/F1 of 98.8%/93.3%/93.0%/96.0% for MGUS and 100.0%/92.3%/99.0%/96.0% for MM. Dates of diagnoses matched (±45 days) with those of gold standard in 73.0% of model-confirmed MGUS and 84.6% of model-confirmed MM. CONCLUSION An NLP-assisted classification model can reliably confirm MGUS and MM diagnoses and dates and extract laboratory results using automated interpretation of EHR data. This algorithm has the potential to be adapted to other disease areas in VHA EHR system.

Publisher

American Society of Clinical Oncology (ASCO)

Subject

General Medicine

Link

https://ascopubs.org/doi/pdfdirect/10.1200/CCI.23.00081

Reference43 articles.

1. A Long-Term Study of Prognosis in Monoclonal Gammopathy of Undetermined Significance

2. Serum free light chain ratio is an independent risk factor for progression in monoclonal gammopathy of undetermined significance

3. Long-term Follow-up of 241 Patients With Monoclonal Gammopathy of Undetermined Significance: The Original Mayo Clinic Series 25 Years Later

4. Smoldering Multiple Myeloma

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The Association of Agent Orange Exposure with the progression of monoclonal gammopathy of undetermined significance to multiple myeloma: a population-based study of Vietnam War Era Veterans;Journal of Hematology & Oncology;2024-01-08