Generating contextual embeddings for emergency department chief complaints-Reference-Cited by-同舟云学术

Generating contextual embeddings for emergency department chief complaints

Published:2020-07-01 Issue:2 Volume:3 Page:160-166
ISSN:2574-2531
Container-title:JAMIA Open
language:en
Short-container-title:

Author:

Chang David¹^ORCID,Hong Woo Suk²^ORCID,Taylor Richard Andrew²

Affiliation:

1. Computational Biology and Bioinformatics Program, Yale University, New Haven, Connecticut, USA

2. Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA

Abstract

Abstract Objective We learn contextual embeddings for emergency department (ED) chief complaints using Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art language model, to derive a compact and computationally useful representation for free-text chief complaints. Materials and methods Retrospective data on 2.1 million adult and pediatric ED visits was obtained from a large healthcare system covering the period of March 2013 to July 2019. A total of 355 497 (16.4%) visits from 65 737 (8.9%) patients were removed for absence of either a structured or unstructured chief complaint. To ensure adequate training set size, chief complaint labels that comprised less than 0.01%, or 1 in 10 000, of all visits were excluded. The cutoff threshold was incremented on a log scale to create seven datasets of decreasing sparsity. The classification task was to predict the provider-assigned label from the free-text chief complaint using BERT, with Long Short-Term Memory (LSTM) and Embeddings from Language Models (ELMo) as baselines. Performance was measured as the Top-k accuracy from k = 1:5 on a hold-out test set comprising 5% of the samples. The embedding for each free-text chief complaint was extracted as the final 768-dimensional layer of the BERT model and visualized using t-distributed stochastic neighbor embedding (t-SNE). Results The models achieved increasing performance with datasets of decreasing sparsity, with BERT outperforming both LSTM and ELMo. The BERT model yielded Top-1 accuracies of 0.65 and 0.69, Top-3 accuracies of 0.87 and 0.90, and Top-5 accuracies of 0.92 and 0.94 on datasets comprised of 434 and 188 labels, respectively. Visualization using t-SNE mapped the learned embeddings in a clinically meaningful way, with related concepts embedded close to each other and broader types of chief complaints clustered together. Discussion Despite the inherent noise in the chief complaint label space, the model was able to learn a rich representation of chief complaints and generate reasonable predictions of their labels. The learned embeddings accurately predict provider-assigned chief complaint labels and map semantically similar chief complaints to nearby points in vector space. Conclusion Such a model may be used to automatically map free-text chief complaints to structured fields and to assist the development of a standardized, data-driven ontology of chief complaints for healthcare institutions.

Funder

NIH

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Link

http://academic.oup.com/jamiaopen/article-pdf/3/2/160/33532968/ooaa022.pdf

Reference35 articles.

1. Chief complaint-based performance measures: a new focus for acute care quality measurement;Griffey;Ann Emerg Med,2015

2. Chief complaints in medical emergencies: do they relate to underlying disease and outcome? The Charité Emergency Medicine Study (CHARITEM);Mockel;Eur J Emerg Med,2013

3. Making recording and analysis of chief complaint a priority for global emergency care research in low-income countries;Mowafi;Acad Emerg Med,2013