Non-removal strategy for outliers in predictive models: The PAELLA algorithm case-Reference-Cited by-同舟云学术

Non-removal strategy for outliers in predictive models: The PAELLA algorithm case

Published:2019-12-09 Issue:4 Volume:28 Page:418-429
ISSN:1367-0751
Container-title:Logic Journal of the IGPL
language:en
Short-container-title:

Author:

Castejón-limas Manuel¹,Alaiz-Moreton Hector²,Fernández-Robles Laura¹,Alfonso-Cendón Javier¹,Fernández-Llamas Camino¹,Sánchez-González lidia¹,Pérez Hilde¹

Affiliation:

1. Department of Mechanical, Informatics and Aerospace Engineering, Universidad de León, Campus de Vegazana, S/N, 24071, Léon, Spain

2. Department of Electrical, Systems and Automatic Engineering, Universidad de León, Campus de Vegazana, S/N, 24071 León, Spain

Abstract

Abstract This paper reports the experience of using the PAELLA algorithm as a helper tool in robust regression instead of as originally intended for outlier identification and removal. This novel usage of the algorithm takes advantage of the occurrence vector calculated by the algorithm in order to strengthen the effect of the more reliable samples and lessen the impact of those that otherwise would be considered outliers. Following that aim, a series of experiments is conducted in order to learn how to better use the information contained in the occurrence vector. Using a contrively difficult artificial data set, a reference predictive model is fit using the whole raw dataset. The second experiment reports the results of fitting a similar predictive model but discarding the samples marked as outliers by PAELLA. The third experiment uses the occurrence vector provided by PAELLA in order to classify the observations in multiple bins and fit every possible model changing which bins are considered for fitting and which are discarded in that particular model. The fourth experiment introduces a sampling process before fitting in which the occurrence vector represents the likelihood of being considered in the training data set. The fifth experiment considers the sampling process as an internal step to be performed interleaved between the training epochs. The last experiment compares our approach using weighted neural networks to a state of the art method.

Funder

Ministerio de Economía, Industria y Competitividad

Publisher

Oxford University Press (OUP)

Subject

Logic

Link

http://academic.oup.com/jigpal/article-pdf/28/4/418/33554795/jzz052.pdf

Reference20 articles.

1. Robust methods for heteroskedastic regression;Atkinson;Computational Statistics & Data Analysis,2016

2. Neural network for regression problems with reduced training sets;Bataineh;Neural Networks,2017

3. Exploratory Data Mining and Data Cleaning

4. An exponential-type kernel robust regression model for interval-valued variables;de A. Lima Neto;Information Sciences,2018

5. Generalization of the influence function method in mining subsidence;Bello García;International Journal of Surface Mining and Reclamation,1996