The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data-Reference-Cited by-同舟云学术

The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data

Published:2023-01-18 Issue:1 Volume:18 Page:e0279540
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Langenberger Benedikt^ORCID,Schulte Timo,Groene Oliver

Abstract

Our aim was to predict future high-cost patients with machine learning using healthcare claims data. We applied a random forest (RF), a gradient boosting machine (GBM), an artificial neural network (ANN) and a logistic regression (LR) to predict high-cost patients in the following year. Therefore, we exploited routinely collected sickness funds claims and cost data of the years 2016, 2017 and 2018. Various specifications of each algorithm were trained and cross-validated on training data (n = 20,984) with claims and cost data from 2016 and outcomes from 2017. The best performing specifications of each algorithm were selected based on validation dataset performance. For performance comparison, selected models were applied to unforeseen data with features of the year 2017 and outcomes of the year 2018 (n = 21,146). The RF was the best performing algorithm measured by the area under the receiver operating curve (AUC) with a value of 0.883 (95% confidence interval (CI): 0.872–0.893) on test data, followed by the GBM (AUC = 0.878; 95% CI: 0.867–0.889). The ANN (AUC = 0.846; 95% CI: 0.834–0.857) and LR (AUC = 0.839; 95% CI: 0.826–0.852) were significantly outperformed by the GBM and the RF. All ML algorithms and the LR performed ´good´ (i.e. 0.9 > AUC ≥ 0.8). We were able to develop machine learning models that predict high-cost patients with ‘good’ performance facilitating routinely collected sickness fund claims and cost data. We found that tree-based models performed best and outperformed the ANN and LR.

Funder

OptiMedis AG

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference61 articles.

1. A 3-year study of high-cost users of health care.;WP Wodchis;CMAJ,2016

2. The concentration of health care expenditures in the U.S. and predictions of future spending.;SB Cohen;JEM,2016

3. Wettbewerb im Gesundheitswesen–eine Gesundheitssystemperspektive.;R. Busse;Zeitschrift für Evidenz, Fortbildung und Qualität im Gesundheitswesen.,2009

4. Hochkostenversicherte in Deutschland: Leistungs- und Kostenprofile.;L Lange;Z Evid Fortbild Qual Gesundhwes,2020

5. High-cost health care users in Ontario, Canada: demographic, socio-economic, and health status characteristics.;LC Rosella;BMC Health Serv Res,2014

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Financial management, efficiency, and care quality: A systematic review in the context of Health 4.0;Health Services Management Research;2024-08-28

2. A predictive modeling approach for Taiwanese diagnosis-related groups medical costs: A focus on laparoscopic appendectomy;Tungs' Medical Journal;2024-07-01

3. Machine-learning based prediction for high health care utilizers using a multi-institution diabetes registry: model training and evaluation (Preprint);JMIR AI;2024-03-16

4. Machine-learning based prediction for high health care utilizers using a multi-institution diabetes registry: model training and evaluation. (Preprint);2024-03-16

5. Machine learning for an explainable cost prediction of medical insurance;Machine Learning with Applications;2024-03