Affiliation:
1. LyRIDS, ECE Paris, 10 rue Sextius Michel, 75015 Paris, France
Abstract
Intrusion detection systems can defectively perform when they are adjusted with datasets that are unbalanced in terms of attack data and non-attack data. Most datasets contain more non-attack data than attack data, and this circumstance can introduce biases in intrusion detection systems, making them vulnerable to cyberattacks. As an approach to remedy this issue, we considered the Conditional Tabular Generative Adversarial Network (CTGAN), with its hyperparameters optimized using the tree-structured Parzen estimator (TPE), to balance an insider threat tabular dataset called the CMU-CERT, which is formed by discrete-value and continuous-value columns. We showed through this method that the mean absolute errors between the probability mass functions (PMFs) of the actual data and the PMFs of the data generated using the CTGAN can be relatively small. Then, from the optimized CTGAN, we generated synthetic insider threat data and combined them with the actual ones to balance the original dataset. We used the resulting dataset for an intrusion detection system implemented with the Adversarial Environment Reinforcement Learning (AE-RL) algorithm in a multi-agent framework formed by an attacker and a defender. We showed that the performance of detecting intrusions using the framework of the CTGAN and the AE-RL is significantly improved with respect to the case where the dataset is not balanced, giving an F1-score of 0.7617.
Subject
Control and Optimization,Computer Networks and Communications,Instrumentation
Reference38 articles.
1. Behavioral analysis of insider threat: A survey and bootstrapped prediction in imbalanced data;Azaria;IEEE Trans. Comput. Soc. Syst.,2014
2. Trzeciak, R., and CERT INSIDER THREAT CENTER (2023, July 24). The CERT Insider Threat Database. Carnegie Mellon University, Software Engineering Institute’s Insights (Blog). Available online: https://insights.sei.cmu.edu/blog/the-cert-insider-threat-database/.
3. Glasser, J., and Lindauer, B. (2013, January 23–24). Bridging the gap: A pragmatic approach to generating insider threat data. Proceedings of the Security and Privacy Workshops, San Francisco, CA, USA.
4. Tavallaee, M., Bagheri, E., Lu, W., and Ghorbani, A.A. (2009, January 8–10). A detailed analysis of the KDD CUP 99 data set. Proceedings of the IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada.
5. Sharafaldin, I., Lashkari, A.H., and Ghorbani, A.A. (2018, January 22–24). Toward generating a new intrusion detection dataset and intrusion traffic characterization. Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP), Funchal, Portugal.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献