Overfitting, Model Tuning, and Evaluation of Prediction Performance-Reference-Cited by-同舟云学术

Overfitting, Model Tuning, and Evaluation of Prediction Performance

Published:2022 Issue: Volume: Page:109-139
ISSN:
Container-title:Multivariate Statistical Machine Learning Methods for Genomic Prediction
language:
Short-container-title:

Author:

Montesinos López Osval Antonio,Montesinos López Abelardo,Crossa Jose

Abstract

AbstractTheoverfittingphenomenon happens when a statistical machine learning model learns very well about the noise as well as the signal that is present in the training data. On the other hand, anunderfittedphenomenon occurs when only a few predictors are included in the statistical machine learning model that represents the complete structure of the data pattern poorly. This problem also arises when the training data set is too small and thus anunderfittedmodel does a poor job of fitting the training data and unsatisfactorily predicts new data points. This chapter describes the importance of the trade-off between prediction accuracy and model interpretability, as well as the difference between explanatory and predictive modeling: Explanatory modeling minimizes bias, whereas predictive modeling seeks to minimize the combination of bias and estimation variance. We assess the importance and different methods of cross-validation as well as the importance and strategies of tuning that are key to the successful use of some statistical machine learning methods. We explain the most important metrics for evaluating the prediction performance for continuous, binary, categorical, and count response variables.

Funder

Bill and Melinda Gates Foundation

Publisher

Springer International Publishing

Link

https://link.springer.com/content/pdf/10.1007/978-3-030-89010-0_4

Reference36 articles.

1. Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78:1–3

2. Buduma M (2017) Fundamentals of deep learning, 1st edn. O’Reilly, Sabastopol, CA

3. Burger SV (2018) Introduction to machine learning with R. Rigorous mathematical analysis, 1st edn. O’Reilly, Sabastopol, CA

4. Cassella G, Berger RL (2002) Statistical inference. Duxbury, Belmont, CA

5. Cohen J (1960) A coefficient of agreement for national data. Educ Psychol Meas 20:37–46

Cited by 49 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Houston, We Have AI Problem! Quality Issues with Neuroimaging‐Based Artificial Intelligence in Parkinson's Disease: A Systematic Review;Movement Disorders;2024-09-05

2. Assessing future changes in flood susceptibility under projections from the sixth coupled model intercomparison project: case study of Algiers City (Algeria);Natural Hazards;2024-09-02

3. Learning a Single Network for Robust Medical Image Segmentation With Noisy Labels;IEEE Transactions on Medical Imaging;2024-09

4. O9Answering new urban questions: Using eXplainable AI-driven analysis to identify determinants of Airbnb price in Dublin;Expert Systems with Applications;2024-09

5. Advancing water absorption capacity in hard winter wheat using a multivariate genomic prediction approach;Crop Science;2024-08-24