Patient Representation From Structured Electronic Medical Records Based on Embedding Technique: Development and Validation Study-Reference-Cited by-同舟云学术

Patient Representation From Structured Electronic Medical Records Based on Embedding Technique: Development and Validation Study

Published:2021-07-23 Issue:7 Volume:9 Page:e19905
ISSN:2291-9694
Container-title:JMIR Medical Informatics
language:en
Short-container-title:JMIR Med Inform

Author:

Huang Yanqun^ORCID,Wang Ni^ORCID,Zhang Zhiqiang^ORCID,Liu Honglei^ORCID,Fei Xiaolu^ORCID,Wei Lan^ORCID,Chen Hui^ORCID

Abstract

Background The secondary use of structured electronic medical record (sEMR) data has become a challenge due to the diversity, sparsity, and high dimensionality of the data representation. Constructing an effective representation for sEMR data is becoming more and more crucial for subsequent data applications. Objective We aimed to apply the embedding technique used in the natural language processing domain for the sEMR data representation and to explore the feasibility and superiority of the embedding-based feature and patient representations in clinical application. Methods The entire training corpus consisted of records of 104,752 hospitalized patients with 13,757 medical concepts of disease diagnoses, physical examinations and procedures, laboratory tests, medications, etc. Each medical concept was embedded into a 200-dimensional real number vector using the Skip-gram algorithm with some adaptive changes from shuffling the medical concepts in a record 20 times. The average of vectors for all medical concepts in a patient record represented the patient. For embedding-based feature representation evaluation, we used the cosine similarities among the medical concept vectors to capture the latent clinical associations among the medical concepts. We further conducted a clustering analysis on stroke patients to evaluate and compare the embedding-based patient representations. The Hopkins statistic, Silhouette index (SI), and Davies-Bouldin index were used for the unsupervised evaluation, and the precision, recall, and F1 score were used for the supervised evaluation. Results The dimension of patient representation was reduced from 13,757 to 200 using the embedding-based representation. The average cosine similarity of the selected disease (subarachnoid hemorrhage) and its 15 clinically relevant medical concepts was 0.973. Stroke patients were clustered into two clusters with the highest SI (0.852). Clustering analyses conducted on patients with the embedding representations showed higher applicability (Hopkins statistic 0.931), higher aggregation (SI 0.862), and lower dispersion (Davies-Bouldin index 0.551) than those conducted on patients with reference representation methods. The clustering solutions for patients with the embedding-based representation achieved the highest F1 scores of 0.944 and 0.717 for two clusters. Conclusions The feature-level embedding-based representations can reflect the potential clinical associations among medical concepts effectively. The patient-level embedding-based representation is easy to use as continuous input to standard machine learning algorithms and can bring performance improvements. It is expected that the embedding-based representation will be helpful in a wide range of secondary uses of sEMR data.

Publisher

JMIR Publications Inc.

Subject

Health Information Management,Health Informatics

Reference37 articles.

1. A Regularized Deep Learning Approach for Clinical Risk Prediction of Acute Coronary Syndrome Using Electronic Health Records

2. The application of unsupervised deep learning in predictive models using electronic health records

3. Multi-perspective predictive modeling for acute kidney injury in general hospital populations using electronic medical records

4. Feature rearrangement based deep learning system for predicting heart failure mortality

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Integration of natural and deep artificial cognitive models in medical images: BERT-based NER and relation extraction for electronic medical records;Frontiers in Neuroscience;2023-09-04

2. Representation of time-varying and time-invariant EMR data and its application in modeling outcome prediction for heart failure patients;Journal of Biomedical Informatics;2023-07

3. Heart failure disease prediction and stratification with temporal electronic health records data using patient representation;Biocybernetics and Biomedical Engineering;2023-01

4. Improving the Performance of Outcome Prediction for Inpatients With Acute Myocardial Infarction Based on Embedding Representation Learned From Electronic Medical Records: Development and Validation Study;Journal of Medical Internet Research;2022-08-03

5. Improving Performance of Outcome Prediction for In-patients with Acute Myocardial Infarction Based on Embedding Representation Learned From Electronic Medical Records: Development and Validation Study (Preprint);2022-02-22