Author:
Sakshi Bakhare ,Dr. Sudhir W. Mohod
Abstract
The application of machine learning algorithms for anomaly detection in network traffic data is examined in this study. Using a collection of network flow records that includes attributes such as IP addresses, ports, protocols, and timestamps, the study makes use of correlation heatmaps, box plots, and data visualization to identify trends in numerical characteristics. After preprocessing, which includes timestamp conversion to Unix format, three machine learning models Support Vector Machine (SVM), Gaussian Naive Bayes, and Random Forest are used for anomaly identification. The Random Forest Classifier outperforms SVM and Naive Bayes classifiers with better precision and recall for anomaly diagnosis, achieving an accuracy of 87%. Confusion matrices and classification reports are used to evaluate the models, and they show that the Random Forest Classifier performs better than the other models in identifying abnormalities in network traffic. These results provide significant value to the field of cybersecurity by highlighting the effectiveness of machine learning models specifically, the Random Forest Classifier in boosting anomaly detection capacities for network environment security.