Affiliation:
1. Institute of Information Systems, University of Hamburg, Von-Melle-Park 5, 20146 Hamburg, Germany
2. FOM University of Applied Sciences, Leimkugelstr. 6, 45141 Essen, Germany
Abstract
While generative artificial intelligence has gained popularity, e.g., for the creation of images, it can also be used for the creation of synthetic tabular data. This bears great potential, especially for the healthcare industry, where data are often scarce and underlie privacy restrictions. For instance, the creation of synthetic electronic health records (EHR) promises to improve the usage of machine learning algorithms, which usually work with large amounts of data. This also applies for the prediction of the patient length of stay (LOS), a key measure for hospitals. Thereby, the LOS represents one of the core tools for decision makers to plan the allocation of resources. Thus, this paper aims to add to the still-young research concerning the application of generative adversarial nets (GAN) on tabular EHR. It does that with the intention to leverage the advantages of synthetic data for the prediction of the LOS in order to contribute to the efficiency-enhancing and cost-saving aspirations of hospitals and insurance companies. Therefore, the applicability of synthetic data that is generated using GANs as a proxy for scarce real-world EHR for the patient LOS multi-class classification task is examined. In this context, the Conditional Tabular GAN (CTGAN) and the Copula GAN are selected as the underlying models as they are state-of-the-art GAN architectures designed for generating synthetic tabular data. The CTGAN is found to be the superior model for the underlying use case. Nevertheless, the paper shows that there is still room for improvement when applying state-of-the-art GAN architectures to clinical healthcare data.
Funder
Open Access Fund of the University of Hamburg
Subject
Management, Monitoring, Policy and Law,Renewable Energy, Sustainability and the Environment,Geography, Planning and Development,Building and Construction
Reference35 articles.
1. Predicting length of stay from an electronic patient record system: A primary total knee replacement example;Carter;BMC Med. Inform. Decis. Mak.,2014
2. Dexur (2022, March 20). Understanding & Predicting Length of Stay (LOS) Using Machine Learning. Available online: https://dexur.com/a/ml-research-los/6/.
3. AMA (2023, April 20). Trends in Health Care Spending. Available online: https://www.ama-assn.org/about/research/trends-health-care-spending.
4. Baek, H., Cho, M., Kim, S., Hwang, H., Song, M., and Yoo, S. (2018). Analysis of length of hospital stay using electronic health records: A statistical and data mining approach. PLoS ONE, 13.
5. Improving length of stay prediction using a hidden Markov model;Sotoodeh;Amia Summits Transl. Sci. Proc.,2019