An iterative approach to unsupervised outlier detection using ensemble method and distance-based data filtering-Reference-Cited by-同舟云学术

An iterative approach to unsupervised outlier detection using ensemble method and distance-based data filtering

Published:2022-02-17 Issue:4 Volume:8 Page:3215-3230
ISSN:2199-4536
Container-title:Complex & Intelligent Systems
language:en
Short-container-title:Complex Intell. Syst.

Author:

Chakraborty Bodhan,Chaterjee Agneet,Malakar Samir^ORCID,Sarkar Ram

Abstract

AbstractOutlier or anomaly detection is the process through which datum/data with different properties from the rest of the data is/are identified. Their importance lies in their use in various domains such as fraud detection, network intrusion detection, and spam filtering. In this paper, we introduce a new outlier detection algorithm based on an ensemble method and distance-based data filtering with an iterative approach to detect outliers in unlabeled data. The ensemble method is used to cluster the unlabeled data and to filter out potential isolated outliers from the same by iteratively using a cluster membership threshold until the Dunn index score for clustering is maximized. The distance-based data filtering, on the other hand, removes the potential outlier clusters from the post-clustered data based on a distance threshold using the Euclidean distance measure of each data point from the majority cluster as the filtering factor. The performance of our algorithm is evaluated by applying it to 10 real-world machine learning datasets. Finally, we compare the results of our algorithm to various supervised and unsupervised outlier detection algorithms using Precision@n and F-score evaluation metrics.

Publisher

Springer Science and Business Media LLC

Subject

General Earth and Planetary Sciences,General Environmental Science

Link

https://link.springer.com/content/pdf/10.1007/s40747-022-00674-0.pdf

Reference43 articles.

1. Borah A, Nath B (2019) Rare pattern mining: challenges and future perspectives. Complex Intell Syst 5:1–23

2. Dhieb N, Ghazzai H, Besbes H, Massoud Y (2019) A very deep transfer learning model for vehicle damage detection and localization. In: 2019 31st international conference on microelectronics (ICM). IEEE, pp 158–161

3. Sarkar BK (2017) Big data for secure healthcare system: a conceptual design. Complex Intell Syst 3:133–151

4. Shambharkar V, Sahare V (2016) Survey on outlier detection for support vector machine. Int J Data Min Tech Appl 5:11–14

5. Shah V, Aggarwal AK, Chaubey N (2017) Performance improvement of intrusion detection with fusion of multiple sensors. Complex Intell Syst 3:33–39

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Survey on Improved Feature Engineering Assisted Multi-Constraints Outlier Analysis Model for Heuristic Driven K-Means Clustering;2024 International Conference on Knowledge Engineering and Communication Systems (ICKECS);2024-04-18

2. Customs valuation assessment using cluster-based approach;International Journal of Information Technology;2024-04-05

3. A Novel Filtering Method of Travel-Time Outliers Extracted from Large-Scale Traffic Checkpoint Data;Journal of Transportation Engineering, Part A: Systems;2024-02

4. Survey on extreme learning machines for outlier detection;Machine Learning;2024-01-23

5. Boundary-aware local Density-based outlier detection;Information Sciences;2023-11