Abstract
Outlier mining has become a rapidly developing domain over the recent years with increasing importance in the fields like banking, sensor networks, and health care. In general, anomaly detection methods are compatible with numerical data and ignore categorical data. However, in real-time problems, both numerical and categorical data are to be considered to obtain accurate results. There are several methods available for the outlier detection of high dimensional data in numerical data. In this paper, a feature grouping algorithm for anomaly detection is proposed that considers the categorical data also. This algorithm correlates the features of categorical data and forms feature clusters and detects the outliers. The features are assigned feature weights based on their levels of appearance and the outlier scores are determined. The performance of the feature grouping algorithm is then compared with the traditional algorithms like LOF and Isolation Forest algorithm and state-of-the-art methods like WATCH on UCI datasets. From the experimental evaluation of the results obtained, it is found that the proposed algorithm is comparatively better than the existing algorithms for categorical data.
Publisher
Universidade Estadual de Maringa