Producing Fast and Convenient Machine Learning Benchmarks in R with the stressor Package-Reference-Cited by-同舟云学术

Producing Fast and Convenient Machine Learning Benchmarks in R with the stressor Package

Published:2024 Issue: Volume: Page:239-258
ISSN:1680-743X
Container-title:Journal of Data Science
language:en
Short-container-title:

Author:

Haycock Sam,Bean Brennan,Burchfield Emily

Abstract

The programming overhead required to implement machine learning workflows creates a barrier for many discipline-specific researchers with limited programming experience. The stressor package provides an R interface to Python’s PyCaret package, which automatically tunes and trains 14-18 machine learning (ML) models for use in accuracy comparisons. In addition to providing an R interface to PyCaret, stressor also contains functions that facilitate synthetic data generation and variants of cross-validation that allow for easy benchmarking of the ability of machine-learning models to extrapolate or compete with simpler models on simpler data forms. We show the utility of stressor on two agricultural datasets, one using classification models to predict crop suitability and another using regression models to predict crop yields. Full ML benchmarking workflows can be completed in only a few lines of code with relatively small computational cost. The results, and more importantly the workflow, provide a template for how applied researchers can quickly generate accuracy comparisons of many machine learning models with very little programming.

Publisher

School of Statistics, Renmin University of China

Reference60 articles.

1. Crop species diversity changes in the United States: 1978–2012;PLoS ONE,2015

2. Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest

3. Detecting rock glacier flow structures using Gabor filters and ikonos imagery;Remote Sensing of Environment,2012