Affiliation:
1. University of Calabria, Italy
Abstract
In this work, we introduce a novel definition of outlier, namely the Gradient Outlier Factor (or GOF), with the aim to provide a definition that unifies with the statistical one on some standard distributions but has a different behavior in the presence of mixture distributions. Intuitively, the GOF score measures the probability to stay in the neighborhood of a certain object. It is directly proportional to the density and inversely proportional to the variation of the density. We derive formal properties under which the GOF definition unifies the statistical outlier definition and show that the unification holds for some standard distributions, while the GOF is able to capture tails in the presence of different distributions even if their densities sensibly differ. Moreover, we provide a probabilistic interpretation of the GOF score, by means of the notion of density of the data density. Experimental results confirm that there are scenarios in which the novel definition can be profitably employed. To the best of our knowledge, except for distance-based outlier, no other data mining outlier definition has a so clearly established relationship with statistical outliers.
Publisher
Association for Computing Machinery (ACM)
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献