A feature selection method based on Shapley values robust for concept shift in regression-Reference-Cited by-同舟云学术

A feature selection method based on Shapley values robust for concept shift in regression

Published:2024-05-09 Issue:23 Volume:36 Page:14575-14597
ISSN:0941-0643
Container-title:Neural Computing and Applications
language:en
Short-container-title:Neural Comput & Applic

Author:

Sebastián Carlos^ORCID,González-Guillén Carlos E.^ORCID

Abstract

AbstractFeature selection is one of the most relevant processes in any methodology for creating a statistical learning model. Usually, existing algorithms establish some criterion to select the most influential variables, discarding those that do not contribute to the model with any relevant information. This methodology makes sense in a static situation where the joint distribution of the data does not vary over time. However, when dealing with real data, it is common to encounter the problem of the dataset shift and, specifically, changes in the relationships between variables (concept shift). In this case, the influence of a variable cannot be the only indicator of its quality as a regressor of the model, since the relationship learned in the training phase may not correspond to the current situation. In tackling this problem, our approach establishes a direct relationship between the Shapley values and prediction errors, operating at a more local level to effectively detect the individual biases introduced by each variable. The proposed methodology is evaluated through various examples, including synthetic scenarios mimicking sudden and incremental shift situations, as well as two real-world cases characterized by concept shifts. Additionally, we perform three analyses of standard situations to assess the algorithm’s robustness in the absence of shifts. The results demonstrate that our proposed algorithm significantly outperforms state-of-the-art feature selection methods in concept shift scenarios, while matching the performance of existing methodologies in static situations.

Funder

Centro para el Desarrollo Tecnológico Industrial

Ministerio de Universidades

Universidad Politécnica de Madrid

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s00521-024-09745-4.pdf

Reference37 articles.

1. Venkatesh B, Anuradha J (2019) A review of feature selection and its methods. Cybern Inf Technol 19(1):3–26

2. Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: a data perspective. ACM Comput Surv (CSUR) 50(6):1–45

3. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101

4. Marcílio WE, Eler DM (2020) From explanations to feature selection: assessing shap values as feature selection mechanism. In: 2020 33rd SIBGRAPI conference on graphics, patterns and images (SIBGRAPI), IEEE, pp 340–347

5. Keany E (2020) BorutaShap: a wrapper feature selection method which combines the Boruta feature selection algorithm with shapley values. Zenodo, Geneva, Switzerland