Author:
Aliferis Constantin,Simon Gyorgy
Abstract
AbstractAvoiding over and under fitted analyses (OF, UF) and models is critical for ensuring as high generalization performance as possible and is of profound importance for the success of ML/AI modeling. In modern ML/AI practice OF/UF are typically interacting with error estimator procedures and model selection, as well as with sampling and reporting biases and thus need be considered together in context. The more general situations of over confidence (OC) about models and/or under-performing (UP) models can occur in many subtle and not so subtle ways especially in the presence of high-dimensional data, modest or small sample sizes, powerful learners and imperfect data designs. Because over/under confidence about models are closely related to model complexity, model selection, error estimation and sampling (as part of data design) we connect these concepts with the material of chapters “An Appraisal and Operating Characteristics of Major ML Methods Applicable in Healthcare and Health Science,” “Data Design,” and “Evaluation”. These concepts are also closely related to statistical significance and scientific reproducibility. We examine several common scenarios where over confidence in model performance and/or model under performance occur as well as detailed practices for preventing, testing and correcting them.
Publisher
Springer International Publishing
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献