Abstract
A challenge to many real-world data streams is imbalance with concept drift, which is one of the most critical tasks in anomaly detection. Learning nonstationary data streams for anomaly detection has been well studied in recent years. However, most of the researches assume that the class of data streams is relatively balanced. Only a few approaches tackle the joint issue of imbalance and concept drift. To overcome this joint issue, we propose an ensemble learning method with generative adversarial network-based sampling and consistency check (EGSCC) in this paper. First, we design a comprehensive anomaly detection framework that includes an oversampling module by generative adversarial network, an ensemble classifier, and a consistency check module. Next, we introduce double encoders into GAN to better capture the distribution characteristics of imbalanced data for oversampling. Then, we apply the stacking ensemble learning to deal with concept drift. Four base classifiers of SVM, KNN, DT and RF are used in the first layer, and LR is used as meta classifier in second layer. Last but not least, we take consistency check of the incremental instance and check set to determine whether it is anormal by statistical learning, instead of threshold-based method. And the validation set is dynamic updated according to the consistency check result. Finally, three artificial data sets obtained from Massive Online Analysis platform and two real data sets are used to verify the performance of the proposed method from four aspects: detection performance, parameter sensitivity, algorithm cost and anti-noise ability. Experimental results show that the proposed method has significant advantages in anomaly detection of imbalanced data streams with concept drift.
Funder
Fundamental Research Funds for the Central Universities of Civil Aviation University of China
Publisher
Public Library of Science (PLoS)
Reference41 articles.
1. Ensemble learning for data stream analysis: A survey;B Krawczyk;Inf. Fusion,2017
2. A systematic study of online class imbalance learning with concept drift;S Wang;IEEE Trans. Neural Netw. Learn. Syst,2017
3. A survey on concept drift adaptation;J Gama;ACM Comput. Surv,2014
4. Solution to data imbalance problem in application layer anomaly detection systems;R Kozik;International Conference on Hybrid Artificial Intelligence Systems,2016
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献