Affiliation:
1. KFUEIT: Khwaja Farid University of Engineering and Information Technology
2. Xian Jiaotong University: Xi'an Jiaotong University
3. Dalian Maritime University
Abstract
Abstract
Since the advent of email services, spam emails are a major concern because users’ security depends on the classification of emails as ham or spam. It’s a malware attack that has been used for spear phishing, whaling, clone phishing, website forgery, and other harmful activities. However, various ensemble Machine Learning (ML) algorithms used for the detection and filtering of spam emails have been less explored. In this research, we offer a ML based optimized algorithm for detecting spam emails that have been enhanced using Hyper-parameter tuning approaches. The proposed approach uses two feature extraction modules, namely Count-Vectorizer and TFIDF-Vectorizer that provide the most effective classification results when we applied them to three different publicly available email data sets: Ling Spam, UCI SMS Spam, and Proposed dataset. Moreover, to extend the performance of classifiers we used various ML methods such as Naive Bayes (NB), Logistic Regression (LR), Extra Tree, Stochastic Gradient Descent (SGD), XG-Boost, Support Vector Machine (SVM), Random Forest (RF), Multi Layer Perception (MLP), and parameter optimization approaches such as Manual search, Random search, Grid search, and Genetic algorithm. For all three data sets, the SGD outperformed other algorithms. All of the other ensembles (Extra Tree, RF), linear models (LR, Linear-SVC), and MLP performed admirably, with relatively high precision, recall, accuracies and F1-Score.
Publisher
Research Square Platform LLC
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A semantic-based model with a hybrid feature engineering process for accurate spam detection;Journal of Electrical Systems and Information Technology;2024-07-15
2. Efficient Email Spam Classification with N-gram Features and Ensemble Learning;International Journal of Scientific Research in Computer Science, Engineering and Information Technology;2024-03-28
3. Hybrid Machine Learning Algorithms for Email and Malware Spam Filtering: A Review;European Journal of Theoretical and Applied Sciences;2024-03-01
4. A Comprehensive Review on Email Spam Classification with Machine Learning Methods;International Journal of Scientific Research in Computer Science, Engineering and Information Technology;2023-11-11