Naive Bayes Classification for Email Spam Detection


Syed Zain1,Taher Omar1


1. University of Wollongong in Dubai, UAE


Email is one of the cheapest forms of communication that every internet user utilizes, from individuals to businesses. Because of its simplicity and wide availability, it is vulnerable to threats by perpetrators through spam with malicious intents, known to have resulted in huge financial losses and threatened the privacy of millions of individuals. Not all spam emails are malicious; however, they are a nuisance to users regardless. Because of these reasons, there is a dire need for good spam detection systems that are automatically able to identify emails as spam. This chapter aims to do exactly that by proposing a Naïve Bayes approach to create a spam detection system by using a combination of the Enron Email dataset and the 419 fraud dataset. The datasets are lemmatized in order to boost performance in terms of execution time and accuracy. Grid search is one technique adopted to maximize accuracy. Finally, the model is evaluated through various metrics and a comparative analysis is performed.


IGI Global

Reference27 articles.

1. Abiramasundari, S., Ramaswamy, V., & Sangeetha, J. (2021). Spam filtering using Semantic and Rule Based model via supervised learning. Retrieved from

2. Andre, L. (2021, June 15). You are on the internet almost daily. You check your email, send replies, maybe browse websites and even click.

3. Efficient email classification approach based on semantic methods

4. Chowdhury, S., & Schoen, M. P. (2020, October 2). Research Paper Classification using Supervised Machine Learning Techniques. ResearchGate. XXX&enrichSource=Y292ZXJQYWdlOzM0Njg1MzM2MDtBUzoxMDUxMTgzNTY1MjY2OTQ1Q DE2Mjc2MzMxMDA0ODI%3D&el=1_x_3&_esc=publicationCoverPdf

5. Cukierski, W. (2015). The Enron Email Dataset.







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3