Abstract
AbstractWe experiment with recent ensemble machine learning methods in estimating healthcare costs, utilizing Finnish data containing rich individual-level information on healthcare costs, socioeconomic status and diagnostic data from multiple registries. Our data are a random 10% sample (553,675 observations) from the Finnish population in 2017. Using annual healthcare cost in 2017 as a response variable, we compare the performance of Random forest, Gradient Boosting Machine (GBM) and eXtreme Gradient Boosting (XGBoost) to linear regression. As machine learning methods are often seen as unsuitable in risk adjustment applications because of their relative opaqueness, we also introduce visualizations from the machine learning literature to help interpret the contribution of individual variables to the prediction. Our results show that ensemble machine learning methods can improve predictive performance, with all of them significantly outperforming linear regression, and that a certain level of interpretation can be provided for them. We also find individual-level socioeconomic variables to improve prediction accuracy and that their effect is larger for machine learning methods. However, we find that the predictions used for funding allocations are sensitive to model selection, highlighting the need for comprehensive robustness testing when estimating risk adjustment models used in applications.
Funder
Academy of Finland
Horizon 2020 Framework Programme
Finnish Institute for Health and Welfare
Publisher
Springer Science and Business Media LLC
Reference51 articles.
1. Geruso, M., Layton, T.J.: Selection in health insurance markets and its policy remedies. J. Econ. Perspectives 31(4), 23–50 (2017). https://doi.org/10.1257/jep.31.4.23
2. Breyer, F., Bundorf, M. K., Pauly, M. V.: Health care spending risk, health insurance, and payment to health plans. In: Pauly, M. V., McGuire, T. G., Barros, P. P. (eds.) Handbook of Health Economics Vol. 2, pp. 691–762. Elsevier (2011) https://doi.org/10.1016/B978-0-444-53592-4.00011-6
3. van Kleef, R. C., Schut, F. T., van de Ven, W. P.: Premium regulation, risk equalization, risk sharing, and subsidies: Effects on affordability and efficiency. In: McGuire. T. G., van Kleef, R. C. (eds.) Risk adjustment, risk sharing and premium regulation in health insurance markets, pp. 21– 54. Academic Press (2018). https://doi.org/10.1016/B978-0-12-811325-7.00002-6
4. Chaplin, M., Beatson, S., Yiu-Shing, L., Davies, C., Smyth, C., Burrows, J., Weir, R., Tatarek-Gintowt, R.: Refreshing the Formulae for CCG Allocations. For allocations to Clinical Commissioning Groups from 2016–2017. Report on the methods and modelling. ANHS England, Analytical Services (Finance) (2016)
5. Smith, P.C.: Formula funding of public services: an economic analysis. Oxf. Rev. Econ. Policy. Rev. Econ. Policy 19(2), 301–322 (2003). https://doi.org/10.1093/oxrep/19.2.301