Finding the best trade-off between performance and interpretability in predicting hospital length of stay using structured and unstructured data-Reference-Cited by-同舟云学术

Finding the best trade-off between performance and interpretability in predicting hospital length of stay using structured and unstructured data

Published:2023-11-30 Issue:11 Volume:18 Page:e0289795
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Jaotombo Franck^ORCID,Adorni Luca,Ghattas Badih^ORCID,Boyer Laurent

Abstract

Objective This study aims to develop high-performing Machine Learning and Deep Learning models in predicting hospital length of stay (LOS) while enhancing interpretability. We compare performance and interpretability of models trained only on structured tabular data with models trained only on unstructured clinical text data, and on mixed data. Methods The structured data was used to train fourteen classical Machine Learning models including advanced ensemble trees, neural networks and k-nearest neighbors. The unstructured data was used to fine-tune a pre-trained Bio Clinical BERT Transformer Deep Learning model. The structured and unstructured data were then merged into a tabular dataset after vectorization of the clinical text and a dimensional reduction through Latent Dirichlet Allocation. The study used the free and publicly available Medical Information Mart for Intensive Care (MIMIC) III database, on the open AutoML Library AutoGluon. Performance is evaluated with respect to two types of random classifiers, used as baselines. Results The best model from structured data demonstrates high performance (ROC AUC = 0.944, PRC AUC = 0.655) with limited interpretability, where the most important predictors of prolonged LOS are the level of blood urea nitrogen and of platelets. The Transformer model displays a good but lower performance (ROC AUC = 0.842, PRC AUC = 0.375) with a richer array of interpretability by providing more specific in-hospital factors including procedures, conditions, and medical history. The best model trained on mixed data satisfies both a high level of performance (ROC AUC = 0.963, PRC AUC = 0.746) and a much larger scope in interpretability including pathologies of the intestine, the colon, and the blood; infectious diseases, respiratory problems, procedures involving sedation and intubation, and vascular surgery. Conclusions Our results outperform most of the state-of-the-art models in LOS prediction both in terms of performance and of interpretability. Data fusion between structured and unstructured text data may significantly improve performance and interpretability.

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference75 articles.

1. A systematic review of the prediction of hospital length of stay: Towards a unified framework.;K Stone;PLOS Digit Health,2022

2. Prediction of Length of Stay of First-Ever Ischemic Stroke;K-C Chang;Stroke,2002

3. Health at a Glance 2019

4. Health at a Glance 2021

5. Prediction of general medical admission length of stay with natural language processing and deep learning: a pilot study.;S Bacchi;Intern Emerg Med,2020

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. AI-Based Electroencephalogram Analysis in Rodent Models of Epilepsy: A Systematic Review;Applied Sciences;2024-08-22

2. Computer-aided diagnostic system with automated deep learning method based on the AutoGluon framework improved the diagnostic accuracy of early esophageal cancer;Journal of Gastrointestinal Oncology;2024-04