Author:
Ouyang Qianhe,Tian Jiahe,Wei Jiale
Abstract
E-mail spam filtering is becoming a critical and concerned issue in network security recently, and multiple machine learning techniques have been applied to tackle such sort of classification problem. With the emerging of machine learning framework, most of the tasks has been changed via the effective machine learning algorithms with satisfying performance and high speed. However, the underlying performances of different algorithms under certain given circumstances still lack of an intuitive demonstration. Hence, this study mainly focuses on the performance of two widely-used algorithms (KNN and Naive Bayes) from metrics including accuracy and running time, comparing the unique advantage of each algorithm when classifying emails. The paper uses thousands of spam data to feed two algorithms and analyzes both results respectively, indicating that KNN classifier performs better when determining the spam messages while the opposite is true for the Naive Bayes classifier. Thus, designers can pick an appropriate algorithm easily when dealing with spam filter issues under a given dataset whose features and properties are known.
Publisher
Darcy & Roy Press Co. Ltd.
Reference15 articles.
1. Siddique, Z. B., Khan, M. A., Din, I. U., Almogren, A., Mohiuddin, I., & Nazir, S. (2021). Machine learning-based detection of spam emails. Scientific Programming, 2021.
2. Magdy, S., Abouelseoud, Y., & Mikhail, M. (2022). Efficient spam and phishing emails filtering based on deep learning. Computer Networks, 206, 108826.
3. Amir, A., Srinivasan, B., & Khan, A. I. (2018). Distributed classification for image spam detection. Multimedia Tools and Applications, 77(11), 13249-13278.
4. Wander Fernandes Junior. Enron-Spam dataset. 2019. Retrieved on August 8, 2022. Retrieved from: https://www.kaggle.com/datasets/wanderfj/enron-spam
5. Peng, W., Huang, L., Jia, J., & Ingram, E. (2018, August). Enhancing the naive bayes spam filter through intelligent text modification detection. In 2018 17th IEEE international conference on trust, security and privacy in computing and communications/12th IEEE international conference on big data science and engineering (TrustCom/BigDataSE) (pp. 849-854). IEEE.