Investigating Generalized Performance of Data-Constrained Supervised Machine Learning Models on Novel, Related Samples in Intrusion Detection-Reference-Cited by-同舟云学术

Investigating Generalized Performance of Data-Constrained Supervised Machine Learning Models on Novel, Related Samples in Intrusion Detection

Published:2023-02-07 Issue:4 Volume:23 Page:1846
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

D’hooge Laurens¹^ORCID,Verkerken Miel¹^ORCID,Wauters Tim¹^ORCID,De Turck Filip¹^ORCID,Volckaert Bruno¹^ORCID

Affiliation:

1. IDLab, Department of Information Technology, Ghent University-imec, 9052 Gent, Belgium

Abstract

Recently proposed methods in intrusion detection are iterating on machine learning methods as a potential solution. These novel methods are validated on one or more datasets from a sparse collection of academic intrusion detection datasets. Their recognition as improvements to the state-of-the-art is largely dependent on whether they can demonstrate a reliable increase in classification metrics compared to similar works validated on the same datasets. Whether these increases are meaningful outside of the training/testing datasets is rarely asked and never investigated. This work aims to demonstrate that strong general performance does not typically follow from strong classification on the current intrusion detection datasets. Binary classification models from a range of algorithmic families are trained on the attack classes of CSE-CIC-IDS2018, a state-of-the-art intrusion detection dataset. After establishing baselines for each class at various points of data access, the same trained models are tasked with classifying samples from the corresponding attack classes in CIC-IDS2017, CIC-DoS2017 and CIC-DDoS2019. Contrary to what the baseline results would suggest, the models have rarely learned a generally applicable representation of their attack class. Stability and predictability of generalized model performance are central issues for all methods on all attack classes. Focusing only on the three best-in-class models in terms of interdataset generalization, reveals that for network-centric attack classes (brute force, denial of service and distributed denial of service), general representations can be learned with flat losses in classification performance (precision and recall) below 5%. Other attack classes vary in generalized performance from stark losses in recall (−35%) with intact precision (98+%) for botnets to total degradation of precision and moderate recall loss for Web attack and infiltration models. The core conclusion of this article is a warning to researchers in the field. Expecting results of proposed methods on the test sets of state-of-the-art intrusion detection datasets to translate to generalized performance is likely a serious overestimation. Four proposals to reduce this overestimation are set out as future work directions.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/23/4/1846/pdf

Reference57 articles.

1. Denning, D., and Neumann, P.G. (1985). Requirements and Model for IDES-a Real-Time Intrusion-Detection Expert System, SRI International Menlo Park.

2. An intrusion-detection model;Denning;IEEE Trans. Softw. Eng.,1987

3. Google (2022, December 20). Google Transparency Report. Available online: https://transparencyreport.google.com/https/overview?hl=en.

4. Evasion techniques: Sneaking through your intrusion detection/prevention systems;Cheng;IEEE Commun. Surv. Tutor.,2011

5. Adversarial attacks against intrusion detection systems: Taxonomy, solutions and open issues;Corona;Inf. Sci.,2013

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. CNN-based Network Intrusion Detection and Classification Model for Cyber-Attacks;International Journal of Innovative Science and Research Technology (IJISRT);2024-08-05

2. Fine-tuning inflow prediction models: integrating optimization algorithms and TRMM data for enhanced accuracy;Water Science & Technology;2024-07-03

3. From Bytes to Insights: A Systematic Literature Review on Unraveling IDS Datasets for Enhanced Cybersecurity Understanding;IEEE Access;2024

4. Intrusion Detection System Using Machine Learning by RNN Method;E3S Web of Conferences;2024

5. Improving Generalization of ML-Based IDS With Lifecycle-Based Dataset, Auto-Learning Features, and Deep Learning;IEEE Transactions on Machine Learning in Communications and Networking;2024