Natural Language Processing and Machine Learning to Identify People Who Inject Drugs in Electronic Health Records

Author:

Goodman-Meza David12ORCID,Tang Amber3,Aryanfar Babak2,Vazquez Sergio4,Gordon Adam J56,Goto Michihiko78ORCID,Goetz Matthew Bidwell23,Shoptaw Steven9,Bui Alex A T10

Affiliation:

1. Division of Infectious Diseases, David Geffen School of Medicine, University of California, Los Angeles , Los Angeles, California , USA

2. Veterans Affairs Greater Los Angeles Healthcare System , Los Angeles, California , USA

3. Department of Internal Medicine, David Geffen School of Medicine, University of California, Los Angeles , Los Angeles, California , USA

4. Undergraduate Studies, Dartmouth College , Hanover, New Hampshire , USA

5. Informatics, Decision-Enhancement, and Analytic Sciences Center, Veterans Affairs Salt Lake City Health Care System , Salt Lake City, Utah , USA

6. Division of Epidemiology, Department of Internal Medicine, University of Utah School of Medicine , Salt Lake City, Utah , USA

7. Department of Internal Medicine, University of Iowa , Iowa City, Iowa , USA

8. Center for Access and Delivery Research and Evaluation, Iowa City Veterans Affairs Medical Center , Iowa City, Iowa , USA

9. Department of Family Medicine, David Geffen School of Medicine, University of California, Los Angeles , Los Angeles, California , USA

10. Medical and Imaging Informatics Group, Department of Radiological Sciences, University of California, Los Angeles , Los Angeles, California , USA

Abstract

Abstract Background Improving the identification of people who inject drugs (PWID) in electronic medical records can improve clinical decision making, risk assessment and mitigation, and health service research. Identification of PWID currently consists of heterogeneous, nonspecific International Classification of Diseases (ICD) codes as proxies. Natural language processing (NLP) and machine learning (ML) methods may have better diagnostic metrics than nonspecific ICD codes for identifying PWID. Methods We manually reviewed 1000 records of patients diagnosed with Staphylococcus aureus bacteremia admitted to Veterans Health Administration hospitals from 2003 through 2014. The manual review was the reference standard. We developed and trained NLP/ML algorithms with and without regular expression filters for negation (NegEx) and compared these with 11 proxy combinations of ICD codes to identify PWID. Data were split 70% for training and 30% for testing. We calculated diagnostic metrics and estimated 95% confidence intervals (CIs) by bootstrapping the hold-out test set. Best models were determined by best F-score, a summary of sensitivity and positive predictive value. Results Random forest with and without NegEx were the best-performing NLP/ML algorithms in the training set. Random forest with NegEx outperformed all ICD-based algorithms. F-score for the best NLP/ML algorithm was 0.905 (95% CI, .786–.967) and 0.592 (95% CI, .550–.632) for the best ICD-based algorithm. The NLP/ML algorithm had a sensitivity of 92.6% and specificity of 95.4%. Conclusions NLP/ML outperformed ICD-based coding algorithms at identifying PWID in electronic health records. NLP/ML models should be considered in identifying cohorts of PWID to improve clinical decision making, health services research, and administrative surveillance.

Funder

National Institute on Drug Abuse

UCLA Center for AIDS Research

UCLA Clinical Translational Science Institute

Publisher

Oxford University Press (OUP)

Subject

Infectious Diseases,Oncology

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3