A novel subspace outlier detection method by entropy-based clustering algorithm

Author:

Zuo Zheng1,Li Ziqiang2,Cheng Pengsen3,Zhao Jian1

Affiliation:

1. Chengdu University of Information Technology

2. Sichuan University

3. The National Computer Network Emergency Response Technical Team/Coordination Center of China

Abstract

Abstract Subspace outlier detection has emerged as a practical approach for outlier detection. Classical full space outlier detection methods become ineffective in high dimensional data due to the “curse of dimensionality”. Subspace outlier detection methods have great potential to overcome the problem. However, the challenge becomes how to determine which subspaces to be used for outlier detection among a huge number of all subspaces. In this paper, firstly, we propose an intuitive definition of outliers in subspaces. We study the desirable properties of subspaces for outlier detection and investigate the metrics for those properties.Then, a novel subspace outlier detection algorithm with a statistical foundation is proposed. Our method adopts an entropy-based clustering algorithm to yield interesting subspace for outlier detection.Outliers are then detected by comparing outlierness of its neighbors, and the method only makes use of a small number of most interesting subspaces for outlier detection. We show by experiments that outliers discovered in a very small number of most interesting subspaces achieves remarkable higher accuracy than in the full space, and that the proposed method outperforms competing subspace outlier detection approaches on real world data sets.

Publisher

Research Square Platform LLC

Reference38 articles.

1. T. Fawcett and F. Provost, “Adaptive Fraud Detection,” Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 291–316, Sep. 1997, 00798. [Online]. Available: http://link.springer.com/article/10.1023/A%3A1009700419189

2. J. Mazel, P. Casas, R. Fontugne, K. Fukuda, and P. Owezarski, “Hunting attacks in the dark: clustering and correlation analysis for unsupervised anomaly detection,” Int. J. Network Mgmt, vol. 25, no. 5, pp. 283–305, Sep. 2015, 00000. [Online]. Available: http://onlinelibrary.wiley.com/doi/10.1002/nem.1903/abstract

3. V. Podgorelec, M. Hericko, and I. Rozman, “Improving mining of medical data by outliers prediction,” in 18th IEEE Symposium on Computer-Based Medical Systems, 2005. Proceedings, Jun. 2005, pp. 91–96, 00029.

4. D. M. Hawkins, “Introduction,” in Identification of Outliers, ser. Monographs on Applied Probability and Statistics. Springer Netherlands, 1980, pp. 1–12, 00000. [Online]. Available: http://link.springer.com/chapter/10.1007/978-94-015-3994-41 [5] Barnett, Vic and Lewis, Toby, Outliers in statistical data. Wiley New York, 1994, vol. 3.

5. Barnett, Vic and Lewis, Toby, Outliers in statistical data. Wiley New York, 1994, vol. 3.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3