Affiliation:
1. Guangzhou University
2. Harbin Institute of Technology (Shenzhen)
Abstract
Abstract
The quantity of normal samples is commonly significantly greater than that of malicious samples, resulting in an imbalance in network security data. When dealing with imbalanced samples, the classification model requires careful sampling and attribute selection methods to cope with bias towards majority classes. Simple data sampling methods and incomplete feature selection techniques cannot improve the accuracy of intrusion detection models. In addition, a single intrusion detection model cannot accurately classify all attack types in the face of massive imbalanced security data. Nevertheless, the existing model integration methods based on stacking or voting technologies, suffer from high coupling that undermines their stability and reliability. To address these issues, we propose a Multiple Integration Model (MIM) to implement feature selection and attack classification. First, MIM uses random Oversampling, random Undersampling and Washing Methods (OUWM) to reconstruct the data. Then, a modified simulated annealing algorithm is employed to generate candidate features. Finally, an integrated model based on Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost) and gradient Boosting with Categorical features support (CatBoost) is designed to achieve intrusion detection and attack classification. MIM leverages a Rule-based and Priority-based Ensemble Strategy (RPES) to combine the high accuracy of the former and the high effectiveness of the latter two, improving the stability and reliability of the integration model. We evaluate the effectiveness of our approach on two publicly available intrusion detection datasets, as well as a dataset created by researchers from the University of New Brunswick and another dataset collected by the Australian Center for Cyber Security. In our experiments, MIM significantly outperforms several existing intrusion detection models in terms of accuracy, such as quadratic discriminant analysis, k-nearest neighbor, and back propagation. Specifically, MIM achieves a higher accuracy compared to the two famous models, as well as a model combines deep neural network with deep auto-encoder and another model combines incremental extreme learning machine with an adaptive principal component, with improvements of 5.12% and 5.79%, respectively.
Publisher
Research Square Platform LLC
Reference43 articles.
1. STG2P: A two-stage pipeline model for intrusion detection based on improved LightGBM and K-means;Zhang Z;Simul. Model. Pract. Theory,2022
2. GAN augmentation to deal with imbalance in imaging-based intrusion detection;Giuseppina Andresini A;Future Generation Computer Systems,2021
3. Iman Sharafaldin, A.H., Lashkari: and Ali A. Ghorbani.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy, pp. 108–116 (2018)
4. Autoencoder-based deep metric learning for network intrusion detection;Giuseppina Andresini A;Inf. Sci.,2021
5. Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm;Mohammed A;IEEE Trans. Comput.,2016