Narrowing the gap: expected versus deployment performance-Reference-Cited by-同舟云学术

Narrowing the gap: expected versus deployment performance

Published:2023-06-13 Issue:9 Volume:30 Page:1474-1485
ISSN:1067-5027
Container-title:Journal of the American Medical Informatics Association
language:en
Short-container-title:

Author:

Zhou Alice X¹²^ORCID,Aczon Melissa D¹²^ORCID,Laksana Eugene¹²,Ledbetter David R³,Wetzel Randall C¹²⁴

Affiliation:

1. Department of Anesthesiology and Critical Care Medicine, Children’s Hospital Los Angeles , Los Angeles, California, USA

2. Laura P. and Leland K. Whittier Virtual Pediatric Intensive Care Unit, Children’s Hospital Los Angeles , Los Angeles, California, USA

3. Advanced Analytics for Healthcare, KPMG International Limited , Dallas, Texas, USA

4. Department of Pediatrics and Anesthesiology, University of Southern California Keck School of Medicine , Los Angeles, California, USA

Abstract

Abstract Objectives Successful model development requires both an accurate a priori understanding of future performance and high performance on deployment. Optimistic estimations of model performance that are unrealized in real-world clinical settings can contribute to nonuse of predictive models. This study used 2 tasks, predicting ICU mortality and Bi-Level Positive Airway Pressure failure, to quantify: (1) how well internal test performances derived from different methods of partitioning data into development and test sets estimate future deployment performance of Recurrent Neural Network models and (2) the effects of including older data in the training set on models’ performance. Materials and Methods The cohort consisted of patients admitted between 2010 and 2020 to the Pediatric Intensive Care Unit of a large quaternary children’s hospital. 2010–2018 data were partitioned into different development and test sets to measure internal test performance. Deployable models were trained on 2010–2018 data and assessed on 2019–2020 data, which was conceptualized to represent a real-world deployment scenario. Optimism, defined as the overestimation of the deployed performance by internal test performance, was measured. Performances of deployable models were also compared with each other to quantify the effect of including older data during training. Results, Discussion, and Conclusion Longitudinal partitioning methods, where models are tested on newer data than the development set, yielded the least optimism. Including older years in the training dataset did not degrade deployable model performance. Using all available data for model development fully leveraged longitudinal partitioning by measuring year-to-year performance.

Funder

L.K. Whittier Foundation

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Link

https://academic.oup.com/jamia/article-pdf/30/9/1474/51141563/ocad100.pdf

Reference17 articles.

1. Key challenges for delivering clinical impact with artificial intelligence;Kelly;BMC Med,2019

2. Rethinking algorithm performance metrics for artificial intelligence in diagnostic medicine;Reyna;JAMA,2022

3. Evaluation of machine learning algorithms for health and wellness applications: a tutorial;Tohka;Comput Biol Med,2021

4. The need to approximate the use-case in clinical machine learning;Saeb;Gigascience,2017

5. Estimating real-world performance of a predictive model: A case-study in predicting mortality;Major;JAMIA Open,2020

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Perspectives on implementing models for decision support in clinical care;Journal of the American Medical Informatics Association;2023-08-18