Abstract
The paper elaborates on how text analysis influences classification—a key part of the spam-filtering process. The authors propose a multistage meta-algorithm for checking classifier performance. As a result, the algorithm allows for the fast selection of the best-performing classifiers as well as for the analysis of higher-dimensionality data. The last aspect is especially important when analyzing large datasets. The approach of cross-validation between different datasets for supervised learning is applied in the meta-algorithm. Three machine-learning methods allowing a user to classify e-mails as desirable (ham) or potentially harmful (spam) messages were compared in the paper to illustrate the operation of the meta-algorithm. The used methods are simple, but as the results showed, they are powerful enough. We use the following classifiers: k-nearest neighbours (k-NNs), support vector machines (SVM), and the naïve Bayes classifier (NB). The conducted research gave us the conclusion that multinomial naïve Bayes classifier can be an excellent weapon in the fight against the constantly increasing amount of spam messages. It was also confirmed that the proposed solution gives very accurate results.
Funder
Narodowe Centrum Badań i Rozwoju
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference47 articles.
1. 15 Outrageous Email Spam Statistics that Still Ring True in 2018https://www.propellercrm.com/blog/email-spam-statistics
2. Internet Security Threat Reporthttps://www.symantec.com/content/dam/symantec/docs/reports/istr-24-2019-en.pdf
3. The history of digital spam
4. Machine learning for email spam filtering: review, approaches and open research problems
5. Machine Learning Methods for Spam E-Mail Classification
Cited by
14 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献