Robust Variable Selection with Optimality Guarantees for High-Dimensional Logistic Regression-Reference-Cited by-同舟云学术

Robust Variable Selection with Optimality Guarantees for High-Dimensional Logistic Regression

Published:2021-08-31 Issue:3 Volume:4 Page:665-681
ISSN:2571-905X
Container-title:Stats
language:en
Short-container-title:Stats

Author:

Insolia Luca^ORCID,Kenney Ana,Calovi Martina^ORCID,Chiaromonte Francesca

Abstract

High-dimensional classification studies have become widespread across various domains. The large dimensionality, coupled with the possible presence of data contamination, motivates the use of robust, sparse estimation methods to improve model interpretability and ensure the majority of observations agree with the underlying parametric model. In this study, we propose a robust and sparse estimator for logistic regression models, which simultaneously tackles the presence of outliers and/or irrelevant features. Specifically, we propose the use of L0-constraints and mixed-integer conic programming techniques to solve the underlying double combinatorial problem in a framework that allows one to pursue optimality guarantees. We use our proposal to investigate the main drivers of honey bee (Apis mellifera) loss through the annual winter loss survey data collected by the Pennsylvania State Beekeepers Association. Previous studies mainly focused on predictive performance, however our approach produces a more interpretable classification model and provides evidence for several outlying observations within the survey data. We compare our proposal with existing heuristic methods and non-robust procedures, demonstrating its effectiveness. In addition to the application to honey bee loss, we present a simulation study where our proposal outperforms other methods across most performance measures and settings.

Funder

NIH

Huck Institutes of the Life Sciences

Publisher

MDPI AG

Link

https://www.mdpi.com/2571-905X/4/3/40/pdf

Reference78 articles.

1. Generalized Linear Models;McCullagh,1989

2. Analysis of Binary Data;Cox,1989

3. The Origins of Logistic Regression;Cramer,2002

4. Regularization Paths for Generalized Linear Models via Coordinate Descent

5. Robust Statistics: Theory and Methods;Maronna,2006

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. enetLTS: Robust and Sparse Methods for High Dimensional Linear, Binary, and Multinomial Regression;Journal of Open Source Software;2023-02-13

2. An automated exact solution framework towards solving the logistic regression best subset selection problem;South African Statistical Journal;2023

3. Honey bee colony loss linked to parasites, pesticides and extreme weather across the United States;Scientific Reports;2022-12-01

4. An Analysis of Students’ failing in University Based on Least Square Method and a New $arctan - exp$ Logistic Regression Function;Mathematical Problems in Engineering;2022-02-23