Abstract
Anomalies are data points that are few and different. As a result of these properties, we show that, anomalies are susceptible to a mechanism called
isolation
. This article proposes a method called Isolation Forest (
i
Forest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure---fundamentally different from all existing methods.
As a result,
i
Forest is able to exploit subsampling (i) to achieve a low linear time-complexity and a small memory-requirement and (ii) to deal with the effects of swamping and masking effectively. Our empirical evaluation shows that
i
Forest outperforms ORCA, one-class SVM, LOF and Random Forests in terms of AUC, processing time, and it is robust against masking and swamping effects.
i
Forest also works well in high dimensional problems containing a large number of irrelevant attributes, and when anomalies are not available in training sample.
Funder
Ministry of Science and Technology of the People's Republic of China
National Natural Science Foundation of Jiangsu Province
National Natural Science Foundation of China
Publisher
Association for Computing Machinery (ACM)
Reference44 articles.
1. Outlier detection by active learning
2. DOLPHIN
3. Asuncion A. and Newman D. 2007. UCI machine learning repository. http://mlearn.ics.ucl.edu/MLRepository.html. Asuncion A. and Newman D. 2007. UCI machine learning repository. http://mlearn.ics.ucl.edu/MLRepository.html.
4. Mining distance-based outliers in near linear time with randomization and a simple pruning rule
Cited by
1218 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献