BACKGROUND
The widespread secondary utilization of electronic medical records (EMRs) promotes healthcare quality improvement. Representation learning that can automatically extract hidden information from EMR data has gained increasing attention.
OBJECTIVE
We aimed to propose a patient representation containing more feature associations and task-specific feature importance to improve outcome prediction performance for in-patients with acute myocardial infarction (AMI).
METHODS
Medical concepts including patients’ age, gender, diagnosis diseases, laboratory tests, structured radiological features, procedures and medications were firstly embedded into real-value vectors using the improved skip-gram algorithm where concepts in the context windows were selected by feature association strengths measured by association rules’ confidence. Then each patient was represented as the sum of the feature embeddings weighted by the task-specific feature importance, which was applied to facilitate predictive model prediction from global and local perspectives. We finally applied the proposed patient representation into mortality risk prediction for 3010 and 1671 AMI in-patients from a public dataset and a private dataset, respectively, comparing with several reference representation methods in terms of the areas under the receiver operator curve (AUC).
RESULTS
Compared with the reference methods, the proposed embedding-based representation showed consistently superior predictive performance on two datasets, achieving the mean AUCs of 0.861 and 0.980, while the greatest AUCs among reference methods were 0.852 and 0.942 on the public and private datasets, respectively. Feature importance integrated in patient representation also reflected features that were consistently critical in prediction tasks and clinical practice.
CONCLUSIONS
The introduction of feature associations and feature importance facilitated an effective patient representation and contributed to prediction performance improvement and model interpretation.