Abstract
AbstractPredicting hospitalization from nurse triage notes has significant implications in health informatics. To this end, we compared the performance of the deep-learning transformer-based model, bio-clinical-BERT, with a bag-of-words logistic regression model incorporating term frequency-inverse document frequency (BOW-LR-tf-idf). A retrospective analysis was conducted using data from 1,391,988 Emergency Department patients at the Mount Sinai Health System spanning 2017-2022. The models were trained on four hospitals’ data and externally validated on a fifth. Bio-clinical-BERT achieved higher AUCs (0.82, 0.84, and 0.85) compared to BOW-LR-tf-idf (0.81, 0.83, and 0.84) across training sets of 10,000, 100,000, and ∼1,000,000 patients respectively. Notably, both models proved effective at utilizing triage notes for prediction, despite the modest performance gap. Importantly, our findings suggest that simpler machine learning models like BOW-LR-tf-idf could serve adequately in resource-limited settings. Given the potential implications for patient care and hospital resource management, further exploration of alternative models and techniques is warranted to enhance predictive performance in this critical domain.
Publisher
Cold Spring Harbor Laboratory