I TRIED A BUNCH OF THINGS: THE DANGERS OF UNEXPECTED OVERFITTING IN CLASSIFICATION-Reference-Cited by-同舟云学术

I TRIED A BUNCH OF THINGS: THE DANGERS OF UNEXPECTED OVERFITTING IN CLASSIFICATION

Published:2016-10-03 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Powell Michael,Hosseini Mahan,Collins John,Callahan-Flintoft Chloe,Jones William,Bowman Howard,Wyble Brad^ORCID

Abstract

ABSTRACTMachine learning is a powerful set of techniques that has enhanced the abilities of neuroscientists to interpret information collected through EEG, fMRI, and MEG data. With these powerful techniques comes the danger of overfitting of hyper-parameters which can render results invalid, and cause a failure to generalize beyond the data set. We refer to this problem as ‘over-hyping’ and show that it is pernicious despite commonly used precautions. In particular, over-hyping occurs when an analysis is run repeatedly with slightly different analysis parameters and one set of results is selected based on the analysis. When this is done, the resulting method is unlikely to generalize to a new dataset, rendering it a partially, or perhaps even completely spurious result that will not be valid outside of the data used in the original analysis. While it is commonly assumed that cross-validation is an effective protection against such spurious results generated through overfitting or overhyping, this is not actually true. In this article, we show that both one-shot and iterative optimization of an analysis are prone to over-hyping, despite the use of cross-validation. We demonstrate that non-generalizable results can be obtained even on non-informative (i.e. random) data by modifying hyper-parameters in seemingly innocuous ways. We recommend a number of techniques for limiting over-hyping, such as lock-boxes, blind analyses, pre-registrations, and nested cross-validation. These techniques, are common in other fields that use machine learning, including computer science and physics. Adopting similar safeguards is critical for ensuring the robustness of machine-learning techniques in the neurosciences.

Publisher

Cold Spring Harbor Laboratory

Reference24 articles.

1. A survey of cross-validation procedures for model selection

2. Bouthillier, X. , Varoquaux, G. (2020) Survey of machine-learning experimental methods at NeurIPS2019 and ICLR2020. [Research Report] Inria Saclay Ile de France. 2020. ffhal-02447823f

3. Data-driven region-of-interest selection without inflating Type I error rate

4. On over-fitting in model selection and subsequent selection bias in performance evaluation;The Journal of Machine Learning Research,2010

5. Evidence for a two-peak structure in the A 2 meson;Physics Letters B,1967

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Prediction of amputation risk of patients with diabetic foot using classification algorithms: A clinical study from a tertiary center;International Wound Journal;2024-01

2. The ABC recommendations for validation of supervised machine learning results in biomedical sciences;Frontiers in Big Data;2022-09-27

3. Neural fragility as an EEG marker of the seizure onset zone;Nature Neuroscience;2021-08-05

4. How Do Machines Learn? Artificial Intelligence as a New Era in Medicine;Journal of Personalized Medicine;2021-01-07

5. Sample size evolution in neuroimaging research: An evaluation of highly-cited studies (1990–2012) and of latest practices (2017–2018) in high-impact journals;NeuroImage;2020-11