Affiliation:
1. Electrical Engineering & Computer Science University of Toledo Toledo Ohio USA
Abstract
SummaryThis study addresses the challenge of data imbalance and class overlap in machine learning for intrusion detection, proposing that targeted algorithmic adjustments can significantly enhance model performance. Our hypothesis contends that an ensemble framework, adeptly integrating novel threshold‐adjustment algorithms, can improve classification sensitivity and specificity. To test this, we developed an ensemble model comprising Balanced Bagging (BB), eXtreme Gradient Boosting (XGBoost), and Random Forest (RF), fine‐tuned using grid search for BB and XGBoost, and augmented with the Hellinger metric for RF to tackle data imbalance. The innovation lies in our algorithms, which adeptly adjust the discrimination threshold to rectify the class overlap problem, enhancing the model's ability to discern between negative and positive classes. Utilizing the UNSW‐NB15 dataset, we conducted a comparative analysis for binary and multi‐category classification. Our ensemble model achieved a binary classification accuracy of 97.80%, with a sensitivity rate of 98.26% for detecting attacks, and a multi‐category classification accuracy and sensitivity that reached up to 99.73% and 97.24% for certain attack types. These results substantially surpass those of existing models on the same dataset, affirming our model's superiority in dealing with complex data distributions prevalent in network security domains.