Affiliation:
1. School of Computer Science, Guangdong University of Science and Technology, Dongguan, Guangdong, P.R. China
2. Center for Applied Mathematics of Guangxi, Yulin Normal University, Yulin, Guangxi, P.R. China
Abstract
Outlier detection is critically important in the field of data mining. Real-world data have the impreciseness and ambiguity which can be handled by means of rough set theory. Information entropy is an effective way to measure the uncertainty in an information system. Most outlier detection methods may be called unsupervised outlier detection because they are only dealt with unlabeled data. When sufficient labeled data are available, these methods are used in a decision information system, which means that the decision attribute is discarded. Thus, these methods maybe not right for outlier detection in a a decision information system. This paper proposes supervised outlier detection using conditional information entropy and rough set theory. Firstly, conditional information entropy in a decision information system based on rough set theory is calculated, which provides a more comprehensive measure of uncertainty. Then, the relative entropy and relative cardinality are put forward. Next, the degree of outlierness and weight function are presented to find outlier factors. Finally, a conditional information entropy-based outlier detection algorithm is given. The performance of the given algorithm is evaluated and compared with the existing outlier detection algorithms such as LOF, KNN, Forest, SVM, IE, and ECOD. Twelve data sets have been taken from UCI to prove its efficiency and performance. For example, the AUC value of CIE algorithm in the Hayes data set is 0.949, and the AUC values of LOF, KNN, SVM, Forest, IE and ECOD algorithms in the Hayes data set are 0.647, 0.572, 0.680, 0.676, 0.928 and 0.667, respectively. The advantage of the proposed outlier detection method is that it fully utilizes the decision information.
Subject
Artificial Intelligence,General Engineering,Statistics and Probability
Reference41 articles.
1. Aggarwal C.C. , Outlier analysis, Cham, Switzerland, Springer, 2016.
2. Collective fraud detection capturing inter-transaction dependency;Cao;KDD 2017 Workshop on Anomaly Detection in Finance,2018
3. A comparison of outlier detection algorithms for ITS data;Chen;Expert Systems with Applications,2010
4. On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study;Campos;Data Mining and Knowledge Discovery,2016
5. Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification;Dai;Applied Soft Computing,2013