Building an intrusion detection system on <scp>UNSW</scp>‐<scp>NB15</scp>: Reducing the margin of error to deal with data overlap and imbalance-Reference-Cited by-同舟云学术

Building an intrusion detection system on UNSW‐NB15: Reducing the margin of error to deal with data overlap and imbalance

Published:2024-08-22 Issue: Volume: Page:
ISSN:1532-0626
Container-title:Concurrency and Computation: Practice and Experience
language:en
Short-container-title:Concurrency and Computation

Author:

Zoghi Zeinab¹^ORCID,Serpen Gursel¹

Affiliation:

1. Electrical Engineering & Computer Science University of Toledo Toledo Ohio USA

Abstract

SummaryThis study addresses the challenge of data imbalance and class overlap in machine learning for intrusion detection, proposing that targeted algorithmic adjustments can significantly enhance model performance. Our hypothesis contends that an ensemble framework, adeptly integrating novel threshold‐adjustment algorithms, can improve classification sensitivity and specificity. To test this, we developed an ensemble model comprising Balanced Bagging (BB), eXtreme Gradient Boosting (XGBoost), and Random Forest (RF), fine‐tuned using grid search for BB and XGBoost, and augmented with the Hellinger metric for RF to tackle data imbalance. The innovation lies in our algorithms, which adeptly adjust the discrimination threshold to rectify the class overlap problem, enhancing the model's ability to discern between negative and positive classes. Utilizing the UNSW‐NB15 dataset, we conducted a comparative analysis for binary and multi‐category classification. Our ensemble model achieved a binary classification accuracy of 97.80%, with a sensitivity rate of 98.26% for detecting attacks, and a multi‐category classification accuracy and sensitivity that reached up to 99.73% and 97.24% for certain attack types. These results substantially surpass those of existing models on the same dataset, affirming our model's superiority in dealing with complex data distributions prevalent in network security domains.

Publisher

Wiley

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.8242

Reference60 articles.

1. UNSW‐NB15 computer security dataset: Analysis through visualization

2. Multiclass Imbalance Problems: Analysis and Potential Solutions

3. Kernel Matrix Approximation on Class-Imbalanced Data With an Application to Scientific Simulation

4. Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior