An Auto-Encoder with Genetic Algorithm for High Dimensional Data: Towards Accurate and Interpretable Outlier Detection-Reference-Cited by-同舟云学术

An Auto-Encoder with Genetic Algorithm for High Dimensional Data: Towards Accurate and Interpretable Outlier Detection

Published:2022-11-15 Issue:11 Volume:15 Page:429
ISSN:1999-4893
Container-title:Algorithms
language:en
Short-container-title:Algorithms

Author:

Li Jiamu,Zhang Ji,Bah Mohamed Jaward,Wang Jian,Zhu Youwen,Yang Gaoming,Li Lingling,Zhang Kexin

Abstract

When dealing with high-dimensional data, such as in biometric, e-commerce, or industrial applications, it is extremely hard to capture the abnormalities in full space due to the curse of dimensionality. Furthermore, it is becoming increasingly complicated but essential to provide interpretations for outlier detection results in high-dimensional space as a consequence of the large number of features. To alleviate these issues, we propose a new model based on a Variational AutoEncoder and Genetic Algorithm (VAEGA) for detecting outliers in subspaces of high-dimensional data. The proposed model employs a neural network to create a probabilistic dimensionality reduction variational autoencoder (VAE) that applies its low-dimensional hidden space to characterize the high-dimensional inputs. Then, the hidden vector is sampled randomly from the hidden space to reconstruct the data so that it closely matches the input data. The reconstruction error is then computed to determine an outlier score, and samples exceeding the threshold are tentatively identified as outliers. In the second step, a genetic algorithm (GA) is used as a basis for examining and analyzing the abnormal subspace of the outlier set obtained by the VAE layer. After encoding the outlier dataset’s subspaces, the degree of anomaly for the detected subspaces is calculated using the redefined fitness function. Finally, the abnormal subspace is calculated for the detected point by selecting the subspace with the highest degree of anomaly. The clustering of abnormal subspaces helps filter outliers that are mislabeled (false positives), and the VAE layer adjusts the network weights based on the false positives. When compared to other methods using five public datasets, the VAEGA outlier detection model results are highly interpretable and outperform or have competitive performance compared to current contemporary methods.

Funder

Zhejiang Provincial Natural Science Foundation

Natural Science Foundation of China

Exploratory Research Project of Zhejiang Lab

Publisher

MDPI AG

Subject

Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science

Link

https://www.mdpi.com/1999-4893/15/11/429/pdf

Reference44 articles.

1. Hawkins, S., He, H., Williams, G., and Baxter, R. (2002, January 4–6). Outlier detection using replicator neural networks. Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, Aix-en-Provence, France.

2. EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams;Complexity,2021

3. Cleaning method for status monitoring data of power equipment based on stacked denoising autoencoders;IEEE Access,2017

4. Outlier detection in ocean wave measurements by using unsupervised data mining methods;Pol. Marit. Res.,2018

5. Dimensionality reduction for intrusion detection systems in multi-data streams—A review and proposal of unsupervised feature selection scheme;Emergent Comput.,2017

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Outlier Interpretation Using Regularized Auto Encoders and Genetic Algorithm;2024 IEEE Congress on Evolutionary Computation (CEC);2024-06-30

2. Enhancing the Performance of PSO Algorithm for Clustering High-Dimensional Data Using Autoencoders;Lecture Notes in Networks and Systems;2024

3. Power Quality Disturbances Data Dimensionality Reduction Using Autoencoder;Green Energy and Technology;2024

4. Periodicity Intensity Reveals Insights into Time Series Data: Three Use Cases;Algorithms;2023-02-15

5. Active Power Load Data Dimensionality Reduction Using Autoencoder;Power Quality in Microgrids: Issues, Challenges and Mitigation Techniques;2023