Natural Language Processing Algorithm to Extract Multiple Myeloma Stage From Oncology Notes in the Veterans Affairs Healthcare System

Author:

Goryachev Sergey D.123,Yildirim Cenk123,DuMontier Clark4567ORCID,La Jennifer127ORCID,Dharne Mayuri2,Gaziano J. Michael1257,Brophy Mary T.1238ORCID,Munshi Nikhil C.279ORCID,Driver Jane A.457,Do Nhan V.1238ORCID,Fillmore Nathanael R.1279ORCID

Affiliation:

1. Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), Boston, MA

2. VA Boston Healthcare System, Boston, MA

3. VA Boston Cooperative Studies Program, Boston, MA

4. New England Geriatrics Research, Education and Clinical Center, VA Boston Healthcare System, Boston, MA

5. Division of Aging, Brigham and Women's Hospital, Boston, MA

6. Divison of Population Sciences, Dana-Farber Cancer Institute, Boston, MA

7. Harvard Medical School, Boston, MA

8. Boston University School of Medicine, Boston, MA

9. Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA

Abstract

PURPOSE Stage in multiple myeloma (MM) is an essential measure of disease risk, but its measurement in large databases is often lacking. We aimed to develop and validate a natural language processing (NLP) algorithm to extract oncologists' documentation of stage in the national Veterans Affairs (VA) Healthcare System. METHODS Using nationwide electronic health record (EHR) and cancer registry data from the VA Corporate Data Warehouse, we developed and validated a rule-based NLP algorithm to extract oncologist-determined MM stage. To that end, a clinician annotated MM stage within over 5,000 short snippets of clinical notes, and annotated MM stage at MM treatment initiation for 200 patients. These were allocated into snippet- and patient-level development and validation sets. We developed MM stage extraction and roll-up algorithms within the development sets. After the algorithms were finalized, we validated them using standard measures in held-out validation sets. RESULTS We developed algorithms for three different MM staging systems that have been in widespread use (Revised International Staging System [R-ISS], International Staging System [ISS], and Durie-Salmon [DS]) and for stage reported without a clearly defined system. Precision and recall were uniformly high for MM stage at the snippet level, ranging from 0.92 to 0.99 for the different MM staging systems. Performance in identifying for MM stage at treatment initiation at the patient level was also excellent, with precision of 0.92, 0.96, 0.90, and 0.86 and recall of 0.99, 0.98, 0.94, and 0.92 for R-ISS, ISS, DS, and unclear stage, respectively. CONCLUSION Our MM stage extraction algorithm uses rule-based NLP and data aggregation to accurately measure MM stage documented in oncology notes and pathology reports in VA's national EHR system. It may be adapted to other systems where MM stage is recorded in clinical notes.

Publisher

American Society of Clinical Oncology (ASCO)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3