Improving spam email classification accuracy using ensemble techniques: a stacking approach-Reference-Cited by-同舟云学术

Improving spam email classification accuracy using ensemble techniques: a stacking approach

Published:2023-09-20 Issue:1 Volume:23 Page:505-517
ISSN:1615-5262
Container-title:International Journal of Information Security
language:en
Short-container-title:Int. J. Inf. Secur.

Author:

Adnan Muhammad,Imam Muhammad Osama,Javed Muhammad Furqan,Murtza Iqbal

Abstract

AbstractSpam emails pose a substantial cybersecurity danger, necessitating accurate classification to reduce unwanted messages and mitigate risks. This study focuses on enhancing spam email classification accuracy using stacking ensemble machine learning techniques. We trained and tested five classifiers: logistic regression, decision tree, K-nearest neighbors (KNN), Gaussian naive Bayes and AdaBoost. To address overfitting, two distinct datasets of spam emails were aggregated and balanced. Evaluating individual classifiers based on recall, precision and F1 score metrics revealed AdaBoost as the top performer. Considering evolving spam technology and new message types challenging traditional approaches, we propose a stacking method. By combining predictions from multiple base models, the stacking method aims to improve classification accuracy. The results demonstrate superior performance of the stacking method with the highest accuracy (98.8%), recall (98.8%) and F1 score (98.9%) among tested methods. Additional experiments validated our approach by varying dataset sizes and testing different classifier combinations. Our study presents an innovative combination of classifiers that significantly improves accuracy, contributing to the growing body of research on stacking techniques. Moreover, we compare classifier performances using a unique combination of two datasets, highlighting the potential of ensemble techniques, specifically stacking, in enhancing spam email classification accuracy. The implications extend beyond spam classification systems, offering insights applicable to other classification tasks. Continued research on emerging spam techniques is vital to ensure long-term effectiveness.

Funder

UiT The Arctic University of Norway

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Safety, Risk, Reliability and Quality,Information Systems,Software

Link

https://link.springer.com/content/pdf/10.1007/s10207-023-00756-1.pdf

Reference38 articles.

1. Pfleeger, S.L., Bloom, G.: Canning spam: proposed solutions to unwanted email. IEEE Secur. Priv. 3(2), 40–47 (2005)

2. Grier, C., Thomas, K., Paxson, V., & Zhang, M. (2010, October). @ spam: the underground on 140 characters or less. in Proceedings of the 17th ACM conference on Computer and communications security (pp. 27–37)

3. Agarwal, D.K., Kumar, R.: Spam filtering using SVM with different kernel functions. Int. J. Comput. Appl. 136(5), 16–23 (2016)

4. Heartfield, R., Loukas, G.: A taxonomy of attacks and a survey of defence mechanisms for semantic social engineering attacks. ACM Comput. Surv. (CSUR) 48(3), 1–39 (2015)

5. John, J. P., Moshchuk, A., Gribble, S. D., & Krishnamurthy, A.: Studying spamming botnets using botlab. in NSDI (Vol. 9, No. 2009) (2009, April)

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Explainable AI-based Framework for Efficient Detection of Spam from Text using an Enhanced Ensemble Technique;Engineering, Technology & Applied Science Research;2024-08-02

2. Enhanced Detection of Text and Image Spam Using Cost-Sensitive Deep Learning;Traitement du Signal;2024-06-26

3. An Investigation of AI-Based Ensemble Methods for the Detection of Phishing Attacks;Engineering, Technology & Applied Science Research;2024-06-01

4. Unveiling Deception in Arabic: Optimization of Deceptive Text Detection Across Formal and Informal Genres;IEEE Access;2024