DOLPHIN-Reference-Cited by-同舟云学术

DOLPHIN

Published:2009-03 Issue:1 Volume:3 Page:1-57
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Angiulli Fabrizio¹,Fassetti Fabio¹

Affiliation:

1. DEIS, Università della Calabria, Rende(CS), Italy

Abstract

In this work a novel distance-based outlier detection algorithm, named DOLPHIN, working on disk-resident datasets and whose I/O cost corresponds to the cost of sequentially reading the input dataset file twice, is presented. It is both theoretically and empirically shown that the main memory usage of DOLPHIN amounts to a small fraction of the dataset and that DOLPHIN has linear time performance with respect to the dataset size. DOLPHIN gains efficiency by naturally merging together in a unified schema three strategies, namely the selection policy of objects to be maintained in main memory, usage of pruning rules, and similarity search techniques. Importantly, similarity search is accomplished by the algorithm without the need of preliminarily indexing the whole dataset, as other methods do. The algorithm is simple to implement and it can be used with any type of data, belonging to either metric or nonmetric spaces. Moreover, a modification to the basic method allows DOLPHIN to deal with the scenario in which the available buffer of main memory is smaller than its standard requirements. DOLPHIN has been compared with state-of-the-art distance-based outlier detection algorithms, showing that it is much more efficient.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/1497577.1497581

Reference34 articles.

1. Outlier detection for high dimensional data

2. Very efficient mining of distance-based outliers

3. Outlier mining in large high-dimensional data sets

Cited by 85 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SymNOM-GED: Symmetric neighbor outlier mining in gene expression datasets;Journal of Computational Science;2024-09

2. Layered isolation forest: A multi-level subspace algorithm for improving isolation forest;Neurocomputing;2024-05

3. Enhancing anomaly detectors with LatentOut;Journal of Intelligent Information Systems;2023-11-24

4. Anomaly detection with correlation laws;Data & Knowledge Engineering;2023-05

5. Research on the Derated Power Data Identification Method of a Wind Turbine Based on a Multi-Gaussian–Discrete Joint Probability Model;Sensors;2022-11-17