Classifying UNSW-NB15 Network Traffic in the Big Data Framework Using Random Forest in Spark-Reference-Cited by-同舟云学术

Classifying UNSW-NB15 Network Traffic in the Big Data Framework Using Random Forest in Spark

Published:2021-01 Issue:1 Volume:2 Page:1-23
ISSN:2644-1675
Container-title:International Journal of Big Data Intelligence and Applications
language:en
Short-container-title:

Author:

Bagui Sikha¹^ORCID,Simonds Jason¹,Plenkers Russell¹,Bennett Timothy A.¹,Bagui Subhash¹

Affiliation:

1. University of West Florida, USA

Abstract

The focus of this work is on detecting and classifying attacks in network traffic using a binary as well as multi-class machine learning classifier, Random Forest, in a distributed Big Data environment using Apache Spark. The classifier is tested using the UNSW-NB15 dataset. Major problems in these types of datasets include high dimensionality and imbalanced data. To address the issue of high dimensionality, both Information Gain as well as Principal Components Analysis (PCA) were applied before training and testing the data using Random Forest in Apache Spark. Binary as well as multi-class Random Forest classifiers were compared in a distributed environment, with and without using PCA, using various number of Spark cores and Random Forest trees, in terms of performance time and statistical measures. The highest accuracy was obtained by the binary classifier at 99.94%, using 8 cores and 30 trees. This study obtained higher accuracy and lower FAR rates than previously achieved, with low testing times.

Publisher

IGI Global

Subject

General Medicine

Reference29 articles.

1. Amrita, & Kant, S. (2019). Machine Learning and Feature Selection Approach for Anomaly based Intrusion Detection: A Systematic Novice Approach. International Journal of Innovative Technology and Exploring Engineering, 8(65), 434-443.

2. Resampling imbalanced data for network intrusion detection datasets

3. Performance evaluation of intrusion detection based on machine learning using Apache Spark

4. Brems, M. (2019). A One-Stop Shop for Principal Component Analysis. Towards Data Science. Available: https://towarddatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c

5. Random Forests;L.Brieman;Machine Learning,2001

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Using a Graph Engine to Visualize the Reconnaissance Tactic of the MITRE ATT&CK Framework from UWF-ZeekData22;Future Internet;2023-07-06

2. Three-Way Selection Random Forest Optimization Model for Anomaly Traffic Detection;Electronics;2023-04-10

3. Resampling Imbalanced Network Intrusion Datasets to Identify Rare Attacks;Future Internet;2023-03-29

4. Detecting Reconnaissance and Discovery Tactics from the MITRE ATT&CK Framework in Zeek Conn Logs Using Spark’s Machine Learning in the Big Data Framework;Sensors;2022-10-20

5. Spark Configurations to Optimize Decision Tree Classification on UNSW-NB15;Big Data and Cognitive Computing;2022-04-07