Abstract
In the present era of common email use, the constant challenge of distinguishing between emails that are genuine and spam necessitates the adoption of complex approaches. This study evaluates a Random Forest and Naive Bayes ensemble's performance in handling the difficult problem of email classification by using a voting classifier. The research uses important preprocessing techniques, such as feature selection and data integrity checks in addition to machine learning models, to ensure the validity of the analysis using real email data. Training and evaluating the collaborative learning model—a hybrid of Random Forest and Naive Bayes—focuses on key performance indicators including accuracy and classification reports. Robust techniques are used to address common problems with email data, such as missing values. In particular, our Collaborative Voting Classifier demonstrates its effectiveness as a powerful tool that enhances overall model performance by providing an equitable means of email classification. The results offer a thorough examination of memory, accuracy, and precision together with an understandable illustration made possible by confusion matrices. In this study, we assess the effectiveness of a number of classification algorithms on a particular dataset, including our proposed Voting Classifier, K-Nearest Neighbors, Gaussian Naive Bayes, and Random Forest. With considerable precision (99\%), recall (96\%), and F1-Score (95\%), the proposed Voting Classifier performs exceptionally well overall, with high accuracy (95.9\%). This study offers a thorough viewpoint for real-world classification task applications, giving insightful information about the relative advantages and disadvantages of different methods.
Reference37 articles.
1. Sahami, M., Dumais, S., Heckerman, D., & Horvitz, E. (1998). "A Bayesian approach to filtering junk e-mail." In Learning for Text Categorization: Papers from the 1998 Workshop (Vol. 62, pp. 55-62).
2. Abkenar, S. B., Kashani, M. H., Akbari, M., & Mahdipour, E. (2023). Learning textual features for Twitter spam detection: A systematic literature review. Expert Systems with Applications, 228, 120366.
3. Shaaban, M. A., Hassan, Y. F., & Guirguis, S. K. (2022). Deep convolutional forest: a dynamic deep ensemble approach for spam detection in text. Complex & Intelligent Systems, 8(6), 4897-4909.
4. Fattahi, J., & Mejri, M. (2021, January). SpaML: a bimodal ensemble learning spam detector based on NLP techniques. In 2021 IEEE 5th international conference on cryptography, security and privacy (CSP) (pp. 107-112). IEEE.
5. Zhao, C., Xin, Y., Li, X., Yang, Y., & Chen, Y. (2020). A heterogeneous ensemble learning framework for spam detection in social networks with imbalanced data. Applied Sciences, 10(3), 936.