Author:
Chung Philip,Fong Christine T.,Walters Andrew M.,Yetisgen Meliha,O’Reilly-Shah Vikas N.
Abstract
Abstract
Background
Electronic health records (EHR) contain large volumes of unstructured free-form text notes that richly describe a patient’s health and medical comorbidities. It is unclear if perioperative risk stratification can be performed directly from these notes without manual data extraction. We conduct a feasibility study using natural language processing (NLP) to predict the American Society of Anesthesiologists Physical Status Classification (ASA-PS) as a surrogate measure for perioperative risk. We explore prediction performance using four different model types and compare the use of different note sections versus the whole note. We use Shapley values to explain model predictions and analyze disagreement between model and human anesthesiologist predictions.
Methods
Single-center retrospective cohort analysis of EHR notes from patients undergoing procedures with anesthesia care spanning all procedural specialties during a 5 year period who were not assigned ASA VI and also had a preoperative evaluation note filed within 90 days prior to the procedure. NLP models were trained for each combination of 4 models and 8 text snippets from notes. Model performance was compared using area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC). Shapley values were used to explain model predictions. Error analysis and model explanation using Shapley values was conducted for the best performing model.
Results
Final dataset includes 38,566 patients undergoing 61,503 procedures with anesthesia care. Prevalence of ASA-PS was 8.81% for ASA I, 31.4% for ASA II, 43.25% for ASA III, and 16.54% for ASA IV-V. The best performing models were the BioClinicalBERT model on the truncated note task (macro-average AUROC 0.845) and the fastText model on the full note task (macro-average AUROC 0.865). Shapley values reveal human-interpretable model predictions. Error analysis reveals that some original ASA-PS assignments may be incorrect and the model is making a reasonable prediction in these cases.
Conclusions
Text classification models can accurately predict a patient’s illness severity using only free-form text descriptions of patients without any manual data extraction. They can be an additional patient safety tool in the perioperative setting and reduce manual chart review for medical billing. Shapley feature attributions produce explanations that logically support model predictions and are understandable to clinicians.
Funder
Microsoft Azure Cloud Compute Credits
Bonica Scholars Research Program
Publisher
Springer Science and Business Media LLC
Subject
Anesthesiology and Pain Medicine
Reference59 articles.
1. Rajpurkar P, Zhang J, Lopyrev K, Liang P. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin: Association for Computational Linguistics; 2016. p. 2383–92.
2. Zellers R, Bisk Y, Schwartz R, Choi Y. SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels: Association for Computational Linguistics; 2018. p. 93–104.
3. Wang A, Singh A, Michael J, Hill F, Levy O, Bowman S. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Brussels: Association for Computational Linguistics; 2018. p. 353–5.
4. Wang A, Pruksachatkun Y, Nangia N, Singh A, Michael J, Hill F, et al. SuperGLUE: a stickier benchmark for general-purpose language understanding systems. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc.; 2019. p. 3266–80.
5. Zhang Z, Liu J, Razavian N. BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining. In: Proceedings of the 3rd Clinical Natural Language Processing Workshop. Online: Association for Computational Linguistics; 2020. p. 24–34.