Author:
Jain Raunak,Singh Mrityunjai,Rao A. Ravishankar,Garg Rahul
Abstract
Abstract
Background
Governments worldwide are facing growing pressure to increase transparency, as citizens demand greater insight into decision-making processes and public spending. An example is the release of open healthcare data to researchers, as healthcare is one of the top economic sectors. Significant information systems development and computational experimentation are required to extract meaning and value from these datasets. We use a large open health dataset provided by the New York State Statewide Planning and Research Cooperative System (SPARCS) containing 2.3 million de-identified patient records. One of the fields in these records is a patient’s length of stay (LoS) in a hospital, which is crucial in estimating healthcare costs and planning hospital capacity for future needs. Hence it would be very beneficial for hospitals to be able to predict the LoS early. The area of machine learning offers a potential solution, which is the focus of the current paper.
Methods
We investigated multiple machine learning techniques including feature engineering, regression, and classification trees to predict the length of stay (LoS) of all the hospital procedures currently available in the dataset. Whereas many researchers focus on LoS prediction for a specific disease, a unique feature of our model is its ability to simultaneously handle 285 diagnosis codes from the Clinical Classification System (CCS). We focused on the interpretability and explainability of input features and the resulting models. We developed separate models for newborns and non-newborns.
Results
The study yields promising results, demonstrating the effectiveness of machine learning in predicting LoS. The best R2 scores achieved are noteworthy: 0.82 for newborns using linear regression and 0.43 for non-newborns using catboost regression. Focusing on cardiovascular disease refines the predictive capability, achieving an improved R2 score of 0.62. The models not only demonstrate high performance but also provide understandable insights. For instance, birth-weight is employed for predicting LoS in newborns, while diagnostic-related group classification proves valuable for non-newborns.
Conclusion
Our study showcases the practical utility of machine learning models in predicting LoS during patient admittance. The emphasis on interpretability ensures that the models can be easily comprehended and replicated by other researchers. Healthcare stakeholders, including providers, administrators, and patients, stand to benefit significantly. The findings offer valuable insights for cost estimation and capacity planning, contributing to the overall enhancement of healthcare management and delivery.
Publisher
Springer Science and Business Media LLC
Reference84 articles.
1. Gurría A. Openness and Transparency - Pillars for Democracy, Trust and Progress. OECD.org. Available: https://www.oecd.org/unitedstates/opennessandtransparency-pillarsfordemocracytrustandprogress.htm. Accessed 28 June 2024.
2. Jetzek T. The Sustainable Value of Open Government Data: Uncovering the Generative Mechanisms of Open Data through a Mixed Methods Approach. lCopenhagen Business School, Institut for IT-Ledelse Department of IT Management. 2015.
3. Move fast and heal things: How health care is turning into a consumer product. The Economist. 2022. https://www.economist.com/business/how-health-care-is-turning-into-a-consumer-product/21807114. Accessed 28 June 2024.
4. New York State Department Of Health, Statewide Planning and Research Cooperative System (SPARCS). https://www.health.ny.gov/statistics/sparcs/. Accessed 5 Oct 2022.
5. Rao AR, Chhabra A, Das R, Ruhil V. A framework for analyzing publicly available healthcare data. In 2015 17th International Conference on E-health Networking, Application & Services (IEEE HealthCom). 2015: IEEE, pp. 653–656.