Outlier detection in spatial error models using modified thresholding-based iterative procedure for outlier detection approach-Reference-Cited by-同舟云学术

Outlier detection in spatial error models using modified thresholding-based iterative procedure for outlier detection approach

Published:2024-04-15 Issue:1 Volume:24 Page:
ISSN:1471-2288
Container-title:BMC Medical Research Methodology
language:en
Short-container-title:BMC Med Res Methodol

Author:

Cai Jiaxin,Hu Weiwei,Yang Yuhui,Yan Hong,Chen Fangyao

Abstract

Abstract Background Outliers, data points that significantly deviate from the norm, can have a substantial impact on statistical inference and provide valuable insights in data analysis. Multiple methods have been developed for outlier detection, however, almost all available approaches fail to consider the spatial dependence and heterogeneity in spatial data. Spatial data has diverse formats and semantics, requiring specialized outlier detection methodology to handle these unique properties. For now, there is limited research exists on robust spatial outlier detection methods designed specifically under the spatial error model (SEM) structure. Method We propose the Spatial-Θ-Iterative Procedure for Outlier Detection (Spatial-Θ-IPOD), which utilizes a mean-shift vector to identify outliers within the SEM. Our method enables an effective detection of spatial outliers while also providing robust coefficient estimates. To assess the performance of our approach, we conducted extensive simulations and applied it to a real-world empirical study using life expectancy data from multiple countries. Results Simulation results showed that the masking and JD (Joint Detection) indicators of our Spatial-Θ-IPOD method outperformed several commonly used methods, even in high-dimensional scenarios, demonstrating stable performance. Conversely, the Θ-IPOD method proved to be ineffective in detecting outliers when spatial correlation was present. Moreover, our model successfully provided reliable coefficient estimation alongside outlier detection. The proposed method consistently outperformed other models (both robust and non-robust) in most cases. In the empirical study, our proposed model successfully detected outliers and provided valuable insights in the modeling process. Conclusions Our proposed Spatial-Θ-IPOD offers an effective solution for detecting spatial outliers for SEM while providing robust coefficient estimates. Notably, our approach showcases its relative superiority even in the presence of high leverage points. By successfully identifying outliers, our method enhances the overall understanding of the data and provides valuable insights for further analysis.

Funder

National Key Research and Development Program of China

National Social Science Fund of China

National Natural Science Foundation of China

Natural Science Basic Research Program of Shaanxi Province

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s12874-024-02208-3.pdf

Reference26 articles.

1. Foorthuis R. On the nature and types of anomalies: a review of deviations in data. Int J Data Sci Anal. 2021;12:297–331.

2. Aguinis H, Gottfredson RK, Joo H. Best-Practice Recommendations for Defining, Identifying, and Handling Outliers. Organ Res Methods. 2013;16:270–301.

3. Swersky L, Marques HO, Sander J, Campello RJGB, Zimek A. On the Evaluation of Outlier Detection and One-Class Classification Methods. In Proceedings of the 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Montreal, QC, Canada. 2016;1–10.

4. Wang T, Li Q, Chen B, Li Z. Multiple outliers detection in sparse high-dimensional regression. J Stat Comput Simul. 2018;88:89–107.

5. Smiti A. A critical overview of outlier detection methods. Computer Science Review. 2020;38: 100306.