Affiliation:
1. KAIST, Daejeon, South Korea
2. University of Vermont
Abstract
This paper addresses the problem of efficiently detecting outliers from a data stream as old data points expire from and new data points enter the window incrementally. The proposed method is based on a newly discovered characteristic of a data stream that the change in the locations of data points in the data space is typically very insignificant. This observation has led to the finding that the existing distance-based outlier detection algorithms perform excessive unnecessary computations that are repetitive and/or canceling out the effects. Thus, in this paper, we propose a novel
set-based
approach to detecting outliers, whereby data points at similar locations are grouped and the detection of outliers or inliers is handled at the group level. Specifically, a new algorithm NETS is proposed to achieve a remarkable performance improvement by realizing set-based early identification of outliers or inliers and taking advantage of the "net effect" between expired and new data points. Additionally, NETS is capable of achieving the same efficiency even for a high-dimensional data stream through
two-level dimensional filtering
. Comprehensive experiments using six real-world data streams show 5 to 25 times faster processing time than state-of-the-art algorithms with comparable memory consumption. We assert that NETS opens a new possibility to real-time data stream outlier detection.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
43 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Online Drift Detection with Maximum Concept Discrepancy;Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining;2024-08-24
2. Distance-Based Outlier Query Optimization in Apache IoTDB;Proceedings of the VLDB Endowment;2024-07
3. Parameter-free Streaming Distance-based Outlier Detection;2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW);2024-05-13
4. Multiple Continuous Top-K Queries Over Data Stream;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
5. IoTDQ: An Industrial IoT Data Analysis Library for Apache IoTDB;Big Data Mining and Analytics;2024-03