Feature selection algorithms in generalized additive models under concurvity-Reference-Cited by-同舟云学术

Feature selection algorithms in generalized additive models under concurvity

Published:2022-11-03 Issue: Volume: Page:
ISSN:0943-4062
Container-title:Computational Statistics
language:en
Short-container-title:Comput Stat

Author:

Kovács László^ORCID

Abstract

AbstractIn this paper, the properties of 10 different feature selection algorithms for generalized additive models (GAMs) are compared on one simulated and two real-world datasets under concurvity. Concurvity can be interpreted as a redundancy in the feature set of a GAM. Like multicollinearity in linear models, concurvity causes unstable parameter estimates in GAMs and makes the marginal effect of features harder interpret. Feature selection algorithms for GAMs can be separated into four clusters: stepwise, boosting, regularization and concurvity controlled methods. Our numerical results show that algorithms with no constraints on concurvity tend to select a large feature set, without significant improvements in predictive performance compared to a more parsimonious feature set. A large feature set is accompanied by harmful concurvity in the proposed models. To tackle the concurvity phenomenon, recent feature selection algorithms such as the mRMR and the HSIC-Lasso incorporated some constraints on concurvity in their objective function. However, these algorithms interpret concurvity as pairwise non-linear relationship between features, so they do not account for the case when a feature can be accurately estimated as a multivariate function of several other features. This is confirmed by our numerical results. Our own solution to the problem, a hybrid genetic–harmony search algorithm (HA) introduces constrains on multivariate concurvity directly. Due to this constraint, the HA proposes a small and not redundant feature set with predictive performance similar to that of models with far more features.

Funder

Ministry For Innovation and Technology Hungary

Corvinus University of Budapest

Publisher

Springer Science and Business Media LLC

Subject

Computational Mathematics,Statistics, Probability and Uncertainty,Statistics and Probability

Link

https://link.springer.com/content/pdf/10.1007/s00180-022-01292-7.pdf

Reference47 articles.

1. Altman N, Krzywinski M (2016) Analyzing outliers: Influential or nuisance? Nat Methods 13(4):281–283

2. Amodio S, Aria M, D’Ambrosio A (2014) On concurvity in nonlinear and nonparametric regression models. Statistica 74(1):85–98