Author:
Li Jiamu,Zhang Ji,Bah Mohamed Jaward,Wang Jian,Zhu Youwen,Yang Gaoming,Li Lingling,Zhang Kexin
Abstract
When dealing with high-dimensional data, such as in biometric, e-commerce, or industrial applications, it is extremely hard to capture the abnormalities in full space due to the curse of dimensionality. Furthermore, it is becoming increasingly complicated but essential to provide interpretations for outlier detection results in high-dimensional space as a consequence of the large number of features. To alleviate these issues, we propose a new model based on a Variational AutoEncoder and Genetic Algorithm (VAEGA) for detecting outliers in subspaces of high-dimensional data. The proposed model employs a neural network to create a probabilistic dimensionality reduction variational autoencoder (VAE) that applies its low-dimensional hidden space to characterize the high-dimensional inputs. Then, the hidden vector is sampled randomly from the hidden space to reconstruct the data so that it closely matches the input data. The reconstruction error is then computed to determine an outlier score, and samples exceeding the threshold are tentatively identified as outliers. In the second step, a genetic algorithm (GA) is used as a basis for examining and analyzing the abnormal subspace of the outlier set obtained by the VAE layer. After encoding the outlier dataset’s subspaces, the degree of anomaly for the detected subspaces is calculated using the redefined fitness function. Finally, the abnormal subspace is calculated for the detected point by selecting the subspace with the highest degree of anomaly. The clustering of abnormal subspaces helps filter outliers that are mislabeled (false positives), and the VAE layer adjusts the network weights based on the false positives. When compared to other methods using five public datasets, the VAEGA outlier detection model results are highly interpretable and outperform or have competitive performance compared to current contemporary methods.
Funder
Zhejiang Provincial Natural Science Foundation
Natural Science Foundation of China
Exploratory Research Project of Zhejiang Lab
Subject
Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science
Reference44 articles.
1. Hawkins, S., He, H., Williams, G., and Baxter, R. (2002, January 4–6). Outlier detection using replicator neural networks. Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, Aix-en-Provence, France.
2. EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams;Complexity,2021
3. Cleaning method for status monitoring data of power equipment based on stacked denoising autoencoders;IEEE Access,2017
4. Outlier detection in ocean wave measurements by using unsupervised data mining methods;Pol. Marit. Res.,2018
5. Dimensionality reduction for intrusion detection systems in multi-data streams—A review and proposal of unsupervised feature selection scheme;Emergent Comput.,2017
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献