Author:
Li Li-Hua,Ahmad Ramli,Tanone Radius,Sharma Alok Kumar
Abstract
Attacks on the Intrusion Detection System (IDS) can result in an imbalanced dataset, making it difficult to predict what types of attacks will occur. A novel method called SMOTE Tree Boosting (STB) is proposed to generate synthetic tabular data from imbalanced datasets using the Synthetic Minority Oversampling Technique (SMOTE) method. In this experiment, multiple datasets were used along with three boosting-based machine learning algorithms (LightGBM, XGBoost, and CatBoost). Our results show that using SMOTE improves the content accuracy of the LightGBM and XGBoost algorithms. Using SMOTE also helps to better predict computational processes. proven by its accuracy and F1 score, which average 99%, which is higher than several previous studies attempting to solve the same problem known as imbalanced IDS datasets. Based on an analysis of the three IDS datasets, the average computation time required for the LightGBM model is 2.29 seconds, 11.58 seconds for the XGBoost model, and 52.9 seconds for the CatBoost model. This shows that our proposed model is able to process data quickly.