A Comparative Study for Outlier Detection Methods in High Dimensional Text Data-Reference-Cited by-同舟云学术

A Comparative Study for Outlier Detection Methods in High Dimensional Text Data

Published:2022-11-28 Issue:1 Volume:13 Page:5-17
ISSN:2449-6499
Container-title:Journal of Artificial Intelligence and Soft Computing Research
language:en
Short-container-title:

Author:

Park Cheong Hee¹^ORCID

Affiliation:

1. Department of Computer Science and Engineering , Chungnam National University , 220 Gung-dong, Yuseong-gu, Daejeon, 305-763 , Korea

Abstract

Abstract Outlier detection aims to find a data sample that is significantly different from other data samples. Various outlier detection methods have been proposed and have been shown to be able to detect anomalies in many practical problems. However, in high dimensional data, conventional outlier detection methods often behave unexpectedly due to a phenomenon called the curse of dimensionality. In this paper, we compare and analyze outlier detection performance in various experimental settings, focusing on text data with dimensions typically in the tens of thousands. Experimental setups were simulated to compare the performance of outlier detection methods in unsupervised versus semi-supervised mode and uni-modal versus multi-modal data distributions. The performance of outlier detection methods based on dimension reduction is compared, and a discussion on using k-NN distance in high dimensional data is also provided. Analysis through experimental comparison in various environments can provide insights into the application of outlier detection methods in high dimensional data.

Publisher

Walter de Gruyter GmbH

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Hardware and Architecture,Modeling and Simulation,Information Systems

Link

https://www.sciendo.com/pdf/10.2478/jaiscr-2023-0001

Reference42 articles.

1. [1] D. Hawkins. Identification of Outliers. Chapman and Hall, 1980.10.1007/978-94-015-3994-4

2. [2] C. Aggarwal. Outlier analysis (2nd ed.) Springer, 2017.10.1007/978-3-319-47578-3

3. [3] Caroline Cynthia and Thomas George. An outlier detection approach on credit card fraud detection using machine learning: A comparative analysis on supervised and unsupervised learning. In: Peter J., Fernandes S., Alavi A. (eds) Intelligence in Big Data Technologies-Beyond the Hype. Advances in Intelligent Systems and Computing, 1167, 2021.

4. [4] H. Mazzawi, G. Dalai, D. Rozenblat, L. Ein-Dor, M. Ninio, O. Lavi, A. Adir, E. Aharoni, and E. Kermany. Anomaly detection in large databases using behavioral patterning. In ICDE, 2017.10.1109/ICDE.2017.158

5. [5] T. Li, J. Ma, and C. Sun. Dlog: diagnosing router events with syslogs for anomaly detection. The Journal of Supercomputing, 74(2):845–867, 2018.

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Brief Survey on Graph Anomaly Detection;Procedia Computer Science;2024

2. Interpretable Single-dimension Outlier Detection (ISOD): An Unsupervised Outlier Detection Method Based on Quantiles and Skewness Coefficients;Applied Sciences;2023-12-22

3. Hyperspectral band selection algorithm based on artificial bee colony fusion genetic idea;2023 3rd International Symposium on Computer Technology and Information Science (ISCTIS);2023-07-07

4. Comparison and Analysis of Detection Methods for Typhoon-Storm Surges Based on Tide-Gauge Data—Taking Coasts of China as Examples;International Journal of Environmental Research and Public Health;2023-02-13

5. Detecting Outliers in Non-IID Data: A Systematic Literature Review;IEEE Access;2023