Abstract
Cross-validation remains a popular means of developing and validating artificial intelligence for health care. Numerous subtypes of cross-validation exist. Although tutorials on this validation strategy have been published and some with applied examples, we present here a practical tutorial comparing multiple forms of cross-validation using a widely accessible, real-world electronic health care data set: Medical Information Mart for Intensive Care-III (MIMIC-III). This tutorial explored methods such as K-fold cross-validation and nested cross-validation, highlighting their advantages and disadvantages across 2 common predictive modeling use cases: classification (mortality) and regression (length of stay). We aimed to provide readers with reproducible notebooks and best practices for modeling with electronic health care data. We also described sets of useful recommendations as we demonstrated that nested cross-validation reduces optimistic bias but comes with additional computational challenges. This tutorial might improve the community’s understanding of these important methods while catalyzing the modeling community to apply these guides directly in their work using the published code.
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献