Balancing the Scale: Data Augmentation Techniques for Improved Supervised Learning in Cyberattack Detection-Reference-Cited by-同舟云学术

Balancing the Scale: Data Augmentation Techniques for Improved Supervised Learning in Cyberattack Detection

Published:2024-09-04 Issue:3 Volume:5 Page:2170-2205
ISSN:2673-4117
Container-title:Eng
language:en
Short-container-title:Eng

Author:

Medvedieva Kateryna¹^ORCID,Tosi Tommaso¹^ORCID,Barbierato Enrico¹^ORCID,Gatti Alice¹^ORCID

Affiliation:

1. Department of Mathematics and Physics, Catholic University of the Sacred Heart, 25121 Brescia, Italy

Abstract

The increasing sophistication of cyberattacks necessitates the development of advanced detection systems capable of accurately identifying and mitigating potential threats. This research addresses the critical challenge of cyberattack detection by employing a comprehensive approach that includes generating a realistic yet imbalanced dataset simulating various types of cyberattacks. Recognizing the inherent limitations posed by imbalanced data, we explored multiple data augmentation techniques to enhance the model’s learning effectiveness and ensure robust performance across different attack scenarios. Firstly, we constructed a detailed dataset reflecting real-world conditions of network intrusions by simulating a range of cyberattack types, ensuring it embodies the typical imbalances observed in genuine cybersecurity threats. Subsequently, we applied several data augmentation techniques, including SMOTE and ADASYN, to address the skew in class distribution, thereby providing a more balanced dataset for training supervised machine learning models. Our evaluation of these techniques across various models, such as Random Forests and Neural Networks, demonstrates significant improvements in detection capabilities. Moreover, the analysis also extends to the investigation of feature importance, providing critical insights into which attributes most significantly influence the predictive outcomes of the models. This not only enhances the interpretability of the models but also aids in refining feature engineering and selection processes to optimize performance.

Publisher

MDPI AG

Link

https://www.mdpi.com/2673-4117/5/3/114/pdf

Reference22 articles.

1. The role of machine learning in cybersecurity;Apruzzese;Digit. Threat. Res. Pract.,2023

2. The significance of machine learning and deep learning techniques in cybersecurity: A comprehensive review;Mijwil;Iraqi J. Comput. Sci. Math.,2023

3. Bagui, S., Mink, D., Bagui, S., Ghosh, T., McElroy, T., Paredes, E., Khasnavis, N., and Plenkers, R. (2022). Detecting reconnaissance and discovery tactics from the MITRE ATT&CK framework in Zeek conn logs using spark’s machine learning in the big data framework. Sensors, 22.

4. Anomaly-based intrusion detection by machine learning: A case study on probing attacks to an institutional network;Tufan;IEEE Access,2021

5. Recurrent deep learning-based feature fusion ensemble meta-classifier approach for intelligent network intrusion detection system;Ravi;Comput. Electr. Eng.,2022