Efficient Email Spam Classification with N-gram Features and Ensemble Learning-Reference-Cited by-同舟云学术

Efficient Email Spam Classification with N-gram Features and Ensemble Learning

Published:2024-03-28 Issue:2 Volume:10 Page:278-284
ISSN:2456-3307
Container-title:International Journal of Scientific Research in Computer Science, Engineering and Information Technology
language:
Short-container-title:Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol

Author:

Prachi Bhatnagar ,Degadwala Dr. Sheshang Degadwala

Abstract

In this paper, we present an innovative approach to enhancing email spam classification using N-gram features, TF-IDF weighting, SMOTE oversampling, and ensemble learning techniques such as Decision Trees, Random Forests, and Ensemble Extra Trees. Our methodology involves preprocessing the dataset to extract N-gram features, applying TF-IDF weighting to highlight important terms, and addressing class imbalance through SMOTE. We then train and evaluate multiple classification models and find that the Ensemble Extra Trees algorithm outperforms others in terms of accuracy, precision, recall, and F1-score. Our experiments on benchmark datasets confirm the efficacy of our approach, showcasing significant improvements in spam detection accuracy and highlighting the potential of ensemble learning for email spam classification. This research contributes to the advancement of spam filtering technologies, providing a robust and efficient solution for accurately identifying and categorizing spam emails.

Publisher

Technoscience Academy

Reference15 articles.

1. K. Taghandiki, “Building an Effective Email Spam Classification Model with spaCy,” pp. 1–5, 2023, [Online]. Available: http://arxiv.org/abs/2303.08792

2. R. Fatima et al., “An Optimized Approach For Detection and Classification of Spam Email’s Using Ensemble Methods,” 2023.

3. L. Jeeva and I. S. Khan, “Enhancing Email Spam Filter ’ s Accuracy Using Machine Learning,” vol. 5, no. 4, pp. 1–12, 2023.

4. M. A. Bouke, A. Abdullah, and M. T. Abdullah, “A Lightweight Machine Learning-Based Email Spam Detection Model Using Word Frequency Pattern,” vol. 4, no. 1, pp. 15–28, 2023, doi: 10.48185/jitc.v4i1.653.

5. H. Takci and F. Nusrat, “Highly Accurate Spam Detection with the Help of Feature Selection and Data Transformation,” International Arab Journal of Information Technology, vol. 20, no. 1, pp. 29–37, 2023, doi: 10.34028/iajit/20/1/4.