Mortality Prediction of Patients With Cardiovascular Disease Using Medical Claims Data Under Artificial Intelligence Architectures: Validation Study (Preprint)-Reference-Cited by-同舟云学术

Mortality Prediction of Patients With Cardiovascular Disease Using Medical Claims Data Under Artificial Intelligence Architectures: Validation Study (Preprint)

Published:2020-10-14 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Tran Linh^ORCID,Chi Lianhua^ORCID,Bonti Alessio^ORCID,Abdelrazek Mohamed^ORCID,Chen Yi-Ping Phoebe^ORCID

Abstract

BACKGROUND

Cardiovascular disease (CVD) is the greatest health problem in Australia, which kills more people than any other disease and incurs enormous costs for the health care system. In this study, we present a benchmark comparison of various artificial intelligence (AI) architectures for predicting the mortality rate of patients with CVD using structured medical claims data. Compared with other research in the clinical literature, our models are more efficient because we use a smaller number of features, and this study could help health professionals accurately choose AI models to predict mortality among patients with CVD using only claims data before a clinic visit.

OBJECTIVE

This study aims to support health clinicians in accurately predicting mortality among patients with CVD using only claims data before a clinic visit.

METHODS

The data set was obtained from the Medicare Benefits Scheme and Pharmaceutical Benefits Scheme service information in the period between 2004 and 2014, released by the Department of Health Australia in 2016. It included 346,201 records, corresponding to 346,201 patients. A total of five AI algorithms, including four classical machine learning algorithms (logistic regression [LR], random forest [RF], extra trees [ET], and gradient boosting trees [GBT]) and a deep learning algorithm, which is a densely connected neural network (DNN), were developed and compared in this study. In addition, because of the minority of <i>deceased</i> patients in the data set, a separate experiment using the Synthetic Minority Oversampling Technique (SMOTE) was conducted to enrich the data.

RESULTS

Regarding model performance, in terms of discrimination, GBT and RF were the models with the highest area under the receiver operating characteristic curve (97.8% and 97.7%, respectively), followed by ET (96.8%) and LR (96.4%), whereas DNN was the least discriminative (95.3%). In terms of reliability, LR predictions were the least calibrated compared with the other four algorithms. In this study, despite increasing the training time, SMOTE was proven to further improve the model performance of LR, whereas other algorithms, especially GBT and DNN, worked well with class imbalanced data.

CONCLUSIONS

Compared with other research in the clinical literature involving AI models using claims data to predict patient health outcomes, our models are more efficient because we use a smaller number of features but still achieve high performance. This study could help health professionals accurately choose AI models to predict mortality among patients with CVD using only claims data before a clinic visit.

Publisher

JMIR Publications Inc.

Reference43 articles.

1. Co-prescription patterns of cardiovascular preventive treatments: a cross-sectional study in the Aragon worker’ health study (Spain)

2. Impact of different mortality forecasting methods and explicit assumptions on projected future life expectancy: The case of the Netherlands

3. Mortality Modelling and Forecasting: a Review of Methods

4. Comparison of Machine Learning Methods With National Cardiovascular Data Registry Models for Prediction of Risk of Bleeding After Percutaneous Coronary Intervention

5. Machine Learning and Unsolved Questions