CARRoT: R-package for predictive modelling by means of regression, adjusted for multiple regularisation methods-Reference-Cited by-同舟云学术

CARRoT: R-package for predictive modelling by means of regression, adjusted for multiple regularisation methods

Published:2023-10-12 Issue:10 Volume:18 Page:e0292597
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Bazarova Alina^ORCID,Raseta Marko

Abstract

We present an R-package for predictive modelling, CARRoT (Cross-validation, Accuracy, Regression, Rule of Ten). CARRoT is a tool for initial exploratory analysis of the data, which performs exhaustive search for a regression model yielding the best predictive power with heuristic ‘rules of thumb’ and expert knowledge as regularization parameters. It uses multiple hold-outs in order to internally validate the model. The package allows to take into account multiple factors such as collinearity of the predictors, event per variable rules (EPVs) and R-squared statistics during the model selection. In addition, other constraints, such as forcing specific terms and restricting complexity of the predictive models can be used. The package allows taking pairwise and three-way interactions between variables into account as well. These candidate models are then ranked by predictive power, which is assessed via multiple hold-out procedures and can be parallelised in order to reduce the computational time. Models which exhibited the highest average predictive power over all hold-outs are returned. This is quantified as absolute and relative error in case of continuous outcomes, accuracy and AUROC values in case of categorical outcomes. In this paper we briefly present statistical framework of the package and discuss the complexity of the underlying algorithm. Moreover, using CARRoT and a number of datasets available in R we provide comparison of different model selection techniques: based on EPVs alone, on EPVs and R-squared statistics, on lasso regression, on including only statistically significant predictors and on stepwise forward selection technique.

Funder

Helmholtz Association Initiative and Networking Fund within the framework of Helmholtz AI

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference35 articles.

1. Meaningful Analysis of Small Data Sets: A Clinician’s Guide;J Collins;Proceedings of Greenville Health System,2017

2. Medical Image Data and Datasets in the Era of Machine Learning–Whitepaper from the 2016 C-MIMI Meeting Dataset Session;MD Kohli;J Digit Imaging,2017

3. A simulation study of the number of events per variable in logistic regression analysis;P Peduzzi;Journal of Clinical Epidemiology,1996

4. Relaxing the Rule of Ten Events per Variable in Logistic and Cox Regression;E Vittinghoff;American Journal of Epidemiology,2007

5. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets;EW Steyerberg;Statistics in Medicine,2000

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A nomograph model for predicting the risk of diabetes nephropathy;2024-04-01

2. CARRoT: Predicting Categorical and Continuous Outcomes Using One in Ten Rule;CRAN: Contributed Packages;2018-04-06