Characterizing the Impact of Data-Damaged Models on Generalization Strength in Intrusion Detection-Reference-Cited by-同舟云学术

Characterizing the Impact of Data-Damaged Models on Generalization Strength in Intrusion Detection

Published:2023-04-03 Issue:2 Volume:3 Page:118-144
ISSN:2624-800X
Container-title:Journal of Cybersecurity and Privacy
language:en
Short-container-title:JCP

Author:

D’hooge Laurens¹^ORCID,Verkerken Miel¹^ORCID,Wauters Tim¹^ORCID,De Turck Filip¹^ORCID,Volckaert Bruno¹^ORCID

Affiliation:

1. IDLab-Imec, Department of Information Technology, Ghent University, 9052 Gent, Belgium

Abstract

Generalization is a longstanding assumption in articles concerning network intrusion detection through machine learning. Novel techniques are frequently proposed and validated based on the improvement they attain when classifying one or more of the existing datasets. The necessary follow-up question of whether this increased performance in classification is meaningful outside of the dataset(s) is almost never investigated. This lacuna is in part due to the sparse dataset landscape in network intrusion detection and the complexity of creating new data. The introduction of two recent datasets, namely CIC-IDS2017 and CSE-CIC-IDS2018, opened up the possibility of testing generalization capability within similar academic datasets. This work investigates how well models from different algorithmic families, pretrained on CICIDS2017, are able to classify the samples in CSE-CIC-IDS2018 without retraining. Earlier work has shown how robust these models are to data reduction when classifying state-of-the-art datasets. This work experimentally demonstrates that the implicit assumption that strong generalized performance naturally follows from strong performance on a specific dataset is largely erroneous. The supervised machine learning algorithms suffered flat losses in classification performance ranging from 0 to 50% (depending on the attack class under test). For non-network-centric attack classes, this performance regression is most pronounced, but even the less affected models that classify the network-centric attack classes still show defects. Current implementations of intrusion detection systems (IDSs) with supervised machine learning (ML) as a core building block are thus very likely flawed if they have been validated on the academic datasets, without the consideration for their general performance on other academic or real-world datasets.

Publisher

MDPI AG

Subject

General Medicine

Link

https://www.mdpi.com/2624-800X/3/2/8/pdf

Reference39 articles.

1. Denning, D., and Neumann, P.G. (1985). SRI International.

2. An intrusion-detection model;Denning;IEEE Trans. Softw. Eng.,1987

3. Detecting zero-day attacks using context-aware anomaly detection at the application-layer;Duessel;Int. J. Inf. Secur.,2017

4. TermID: A distributed swarm intelligence-based approach for wireless intrusion detection;Kolias;Int. J. Inf. Secur.,2017

5. A deep learning approach to network intrusion detection;Shone;IEEE Trans. Emerg. Top. Comput. Intell.,2018

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Improving Generalization of ML-Based IDS With Lifecycle-Based Dataset, Auto-Learning Features, and Deep Learning;IEEE Transactions on Machine Learning in Communications and Networking;2024