Spam filtering: how the dimensionality reduction affects the accuracy of Naive Bayes classifiers-Reference-Cited by-同舟云学术

Spam filtering: how the dimensionality reduction affects the accuracy of Naive Bayes classifiers

Published:2010-12-02 Issue:3 Volume:1 Page:183-200
ISSN:1867-4828
Container-title:Journal of Internet Services and Applications
language:en
Short-container-title:J Internet Serv Appl

Author:

Almeida Tiago A.,Almeida Jurandy,Yamakami Akebo

Abstract

Abstract E-mail spam has become an increasingly important problem with a big economic impact in society. Fortunately, there are different approaches allowing to automatically detect and remove most of those messages, and the best-known techniques are based on Bayesian decision theory. However, such probabilistic approaches often suffer from a well-known difficulty: the high dimensionality of the feature space. Many term-selection methods have been proposed for avoiding the curse of dimensionality. Nevertheless, it is still unclear how the performance of Naive Bayes spam filters depends on the scheme applied for reducing the dimensionality of the feature space. In this paper, we study the performance of many term-selection techniques with several different models of Naive Bayes spam filters. Our experiments were diligently designed to ensure statistically sound results. Moreover, we perform an analysis concerning the measurements usually employed to evaluate the quality of spam filters. Finally, we also investigate the benefits of using the Matthews correlation coefficient as a measure of performance.

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Computer Science Applications

Link

https://link.springer.com/content/pdf/10.1007/s13174-010-0014-7.pdf

Reference49 articles.

1. Almeida T, Yamakami A (2010) Content-based spam filtering. In: Proceedings of the 23rd IEEE international joint conference on neural networks, Spain, Barcelona, pp 1–7

2. Almeida T, Yamakami A, Almeida J (2009) Evaluation of approaches for dimensionality reduction applied with Naive Bayes anti-spam filters. In: Proceedings of the 8th IEEE international conference on machine learning and applications, Miami, FL, USA, pp 517–522

3. Almeida T, Yamakami A, Almeida J (2010) Filtering spams using the minimum description length principle. In: Proceedings of the 25th ACM symposium on applied computing, Sierre, Switzerland, pp 1856–1860

4. Almeida T, Yamakami A, Almeida J (2010) Probabilistic anti-spam filtering with dimensionality reduction. In: Proceedings of the 25th ACM symposium on applied computing, Sierre, Switzerland, pp 1802–1806

5. Androutsopoulos I, Koutsias J, Chandrinos K, Paliouras G, Spyropoulos C (2000) An evaluation of Naive Bayesian anti-spam filtering. In: Proceedings of the 11st European conference on machine learning, Barcelona, Spain, pp 9–17

Cited by 51 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Interpretable machine learning-based text classification method for construction quality defect reports;Journal of Building Engineering;2024-07

2. Evaluation of Classification Algorithms for Effective Spam Email Detection Using Spam Email Dataset;2024

3. Highly Accurate Spam Detection with the Help of Feature Selection and Data Transformation;The International Arab Journal of Information Technology;2023

4. Automated Detection of Cystitis in Ultrasound Images Using Deep Learning Techniques;IEEE Access;2023

5. Automated audiometer for home based health care based on mobile app;PROCEEDING OF INTERNATIONAL CONFERENCE ON ENERGY, MANUFACTURE, ADVANCED MATERIAL AND MECHATRONICS 2021;2023