A novel subspace outlier detection method by entropy-based clustering algorithm


Zuo Zheng1,Li Ziqiang2,Cheng Pengsen3,Zhao Jian1


1. Chengdu University of Information Technology

2. Sichuan University

3. The National Computer Network Emergency Response Technical Team/Coordination Center of China


Abstract Subspace outlier detection has emerged as a practical approach for outlier detection. Classical full space outlier detection methods become ineffective in high dimensional data due to the “curse of dimensionality”. Subspace outlier detection methods have great potential to overcome the problem. However, the challenge becomes how to determine which subspaces to be used for outlier detection among a huge number of all subspaces. In this paper, firstly, we propose an intuitive definition of outliers in subspaces. We study the desirable properties of subspaces for outlier detection and investigate the metrics for those properties.Then, a novel subspace outlier detection algorithm with a statistical foundation is proposed. Our method adopts an entropy-based clustering algorithm to yield interesting subspace for outlier detection.Outliers are then detected by comparing outlierness of its neighbors, and the method only makes use of a small number of most interesting subspaces for outlier detection. We show by experiments that outliers discovered in a very small number of most interesting subspaces achieves remarkable higher accuracy than in the full space, and that the proposed method outperforms competing subspace outlier detection approaches on real world data sets.


Research Square Platform LLC

Reference38 articles.

1. T. Fawcett and F. Provost, “Adaptive Fraud Detection,” Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 291–316, Sep. 1997, 00798. [Online]. Available: http://link.springer.com/article/10.1023/A%3A1009700419189

2. J. Mazel, P. Casas, R. Fontugne, K. Fukuda, and P. Owezarski, “Hunting attacks in the dark: clustering and correlation analysis for unsupervised anomaly detection,” Int. J. Network Mgmt, vol. 25, no. 5, pp. 283–305, Sep. 2015, 00000. [Online]. Available: http://onlinelibrary.wiley.com/doi/10.1002/nem.1903/abstract

3. V. Podgorelec, M. Hericko, and I. Rozman, “Improving mining of medical data by outliers prediction,” in 18th IEEE Symposium on Computer-Based Medical Systems, 2005. Proceedings, Jun. 2005, pp. 91–96, 00029.

4. D. M. Hawkins, “Introduction,” in Identification of Outliers, ser. Monographs on Applied Probability and Statistics. Springer Netherlands, 1980, pp. 1–12, 00000. [Online]. Available: http://link.springer.com/chapter/10.1007/978-94-015-3994-41 [5] Barnett, Vic and Lewis, Toby, Outliers in statistical data. Wiley New York, 1994, vol. 3.

5. Barnett, Vic and Lewis, Toby, Outliers in statistical data. Wiley New York, 1994, vol. 3.








Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3