Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts

Author:

Tsui Fuchiang R1234ORCID,Shi Lingyun13,Ruiz Victor13,Ryan Neal D5,Biernesser Candice5,Iyengar Satish6,Walsh Colin G7,Brent David A5

Affiliation:

1. Tsui Laboratory, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA

2. Department of Anesthesiology and Critical Care Medicine, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA

3. Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA

4. Department of Anesthesiology and Critical Care, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA

5. Department of Psychiatry, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA

6. Department of Statistics, School of Arts and Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, USA

7. Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, Tennessee, USA

Abstract

Abstract Objective Limited research exists in predicting first-time suicide attempts that account for two-thirds of suicide decedents. We aimed to predict first-time suicide attempts using a large data-driven approach that applies natural language processing (NLP) and machine learning (ML) to unstructured (narrative) clinical notes and structured electronic health record (EHR) data. Methods This case-control study included patients aged 10–75 years who were seen between 2007 and 2016 from emergency departments and inpatient units. Cases were first-time suicide attempts from coded diagnosis; controls were randomly selected without suicide attempts regardless of demographics, following a ratio of nine controls per case. Four data-driven ML models were evaluated using 2-year historical EHR data prior to suicide attempt or control index visits, with prediction windows from 7 to 730 days. Patients without any historical notes were excluded. Model evaluation on accuracy and robustness was performed on a blind dataset (30% cohort). Results The study cohort included 45 238 patients (5099 cases, 40 139 controls) comprising 54 651 variables from 5.7 million structured records and 798 665 notes. Using both unstructured and structured data resulted in significantly greater accuracy compared to structured data alone (area-under-the-curve [AUC]: 0.932 vs. 0.901 P < .001). The best-predicting model utilized 1726 variables with AUC = 0.932 (95% CI, 0.922–0.941). The model was robust across multiple prediction windows and subgroups by demographics, points of historical most recent clinical contact, and depression diagnosis history. Conclusions Our large data-driven approach using both structured and unstructured EHR data demonstrated accurate and robust first-time suicide attempt prediction, and has the potential to be deployed across various populations and clinical settings.

Funder

Beckwith Institute and the National Institute of Mental Health

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3