Outlier detection for high dimensional data-Reference-Cited by-同舟云学术

Outlier detection for high dimensional data

Published:2001-06 Issue:2 Volume:30 Page:37-46
ISSN:0163-5808
Container-title:ACM SIGMOD Record
language:en
Short-container-title:SIGMOD Rec.

Author:

Aggarwal Charu C.¹,Yu Philip S.¹

Affiliation:

1. IBM T. J. Watson Research Center, Yorktown Heights, NY

Abstract

The outlier detection problem has important applications in the field of fraud detection, network robustness analysis, and intrusion detection. Most such applications are high dimensional domains in which the data can contain hundreds of dimensions. Many recent algorithms use concepts of proximity in order to find outliers based on their relationship to the rest of the data. However, in high dimensional space, the data is sparse and the notion of proximity fails to retain its meaningfulness. In fact, the sparsity of high dimensional data implies that every point is an almost equally good outlier from the perspective of proximity-based definitions. Consequently, for high dimensional data, the notion of finding meaningful outliers becomes substantially more complex and non-obvious. In this paper, we discuss new techniques for outlier detection which find the outliers by studying the behavior of projections from the data set.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/376284.375668

Reference27 articles.

1. Re-designing distance functions and distance-based applications for high dimensional data

2. Fast algorithms for projected clustering

3. Finding generalized projected clusters in high dimensional spaces

4. C. C. Aggarwal J. B. Orlin R. P. Tai. Optimized Crossover for the Independent Set Problem. Operations Research 45(2) March 1997.]] C. C. Aggarwal J. B. Orlin R. P. Tai. Optimized Crossover for the Independent Set Problem. Operations Research 45(2) March 1997.]]

5. Automatic subspace clustering of high dimensional data for data mining applications

Cited by 352 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Outlier detection for heterogeneous data via fuzzy β covering;Expert Systems with Applications;2024-10

2. On Combining Instance Selection and Discretisation: A Comparative Study of Two Combination Orders;Journal of Information & Knowledge Management;2024-08-17

3. Context discovery for anomaly detection;International Journal of Data Science and Analytics;2024-06-18

4. Chemoinformatic regression methods and their applicability domain;Molecular Informatics;2024-05-28

5. Electricity Theft Detection in Smart Grids Using Sarimax and OCR;International Journal of Advanced Research in Science, Communication and Technology;2024-05-26