Developing and validating a natural language processing algorithm to extract preoperative cannabis use status documentation from unstructured narrative clinical notes

Author:

Sajdeya Ruba1ORCID,Mardini Mamoun T2ORCID,Tighe Patrick J3,Ison Ronald L3,Bai Chen2,Jugl Sebastian4,Hanzhi Gao5,Zandbiglari Kimia4,Adiba Farzana I4,Winterstein Almut G4,Pearson Thomas A1,Cook Robert L1,Rouhizadeh Masoud4

Affiliation:

1. Department of Epidemiology, College of Public Health & Health Professions & College of Medicine, University of Florida , Gainesville, Florida, USA

2. Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida , Gainesville, Florida, USA

3. Department of Anesthesiology, College of Medicine, University of Florida , Gainesville, Florida, USA

4. Department of Pharmaceutical Outcomes & Policy, Center for Drug Evaluation and Safety (CoDES), University of Florida , Gainesville, Florida, USA

5. Department of Biostatistics, University of Florida , Gainesville, Florida, USA

Abstract

Abstract Objective This study aimed to develop a natural language processing algorithm (NLP) using machine learning (ML) techniques to identify and classify documentation of preoperative cannabis use status. Materials and Methods We developed and applied a keyword search strategy to identify documentation of preoperative cannabis use status in clinical documentation within 60 days of surgery. We manually reviewed matching notes to classify each documentation into 8 different categories based on context, time, and certainty of cannabis use documentation. We applied 2 conventional ML and 3 deep learning models against manual annotation. We externally validated our model using the MIMIC-III dataset. Results The tested classifiers achieved classification results close to human performance with up to 93% and 94% precision and 95% recall of preoperative cannabis use status documentation. External validation showed consistent results with up to 94% precision and recall. Discussion Our NLP model successfully replicated human annotation of preoperative cannabis use documentation, providing a baseline framework for identifying and classifying documentation of cannabis use. We add to NLP methods applied in healthcare for clinical concept extraction and classification, mainly concerning social determinants of health and substance use. Our systematically developed lexicon provides a comprehensive knowledge-based resource covering a wide range of cannabis-related concepts for future NLP applications. Conclusion We demonstrated that documentation of preoperative cannabis use status could be accurately identified using an NLP algorithm. This approach can be employed to identify comparison groups based on cannabis exposure for growing research efforts aiming to guide cannabis-related clinical practices and policies.

Funder

NIH

National Center for Advancing Translational Sciences

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3