Highly Accurate Spam Detection with the Help of Feature Selection and Data Transformation-Reference-Cited by-同舟云学术

Highly Accurate Spam Detection with the Help of Feature Selection and Data Transformation

Published:2023 Issue:1 Volume:20 Page:
ISSN:2309-4524
Container-title:The International Arab Journal of Information Technology
language:en
Short-container-title:IAJIT

Author:

Takcı Hidayet,Nusrat Fatema

Abstract

The amount of spam is increasing rapidly while the popularity of emails is increasing. This situation has led to the need to filter spam emails. To date, many knowledge-based, learning-based, and clustering-based methods have been developed for filtering spam emails. In this study, machine-learning-based spam detection was targeted, and C4.5, ID3, RndTree, C-Support Vector Classification (C-SVC), and Naïve Bayes algorithms were used for email spam detection. In addition, feature selection and data transformation methods were used to increase spam detection success. Experiments were performed on the UC Irvine Machine Learning Repository (UCI) spambase dataset, and the results were compared for accuracy, Receiver Operating Characteristic (ROC) analysis, and classification speed. According to the accuracy comparison, the C-SVC algorithm gave the highest accuracy with 93.13%, followed by the RndTree algorithm. According to the ROC analysis, the RndTree algorithm gave the best Area Under Curve (AUC) value of 0.999, while the C4.5 algorithm gave the second-best result. The most successful methods in terms of classification speed are Naïve Bayes and RndTree algorithms. In the experiments, it was seen that feature selection and data transformation methods increased spam detection success. The binary transformation that increased the classification success the most and the feature selection method was forward selection.

Publisher

Zarqa University

Subject

General Computer Science

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Feature Selection and Classification of Email Spam Using Orthogonal Linear Jellyfish Swarm Optimizer;2024 Third International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE);2024-04-26

2. Efficient Email Spam Classification with N-gram Features and Ensemble Learning;International Journal of Scientific Research in Computer Science, Engineering and Information Technology;2024-03-28

3. Feature Selection for Robust Spoofing Detection: A Chi-Square-based Machine Learning Approach;2023 2nd International Engineering Conference on Electrical, Energy, and Artificial Intelligence (EICEEAI);2023-12-27

4. Cybersecurity Threats in the Era of AI: Detection of Phishing Domains Through Classification Rules;2023 2nd International Engineering Conference on Electrical, Energy, and Artificial Intelligence (EICEEAI);2023-12-27

5. Classification of Spam and Ham Emails with Machine Learning Techniques for Cyber Security;2023 International Conference on Integrated Intelligence and Communication Systems (ICIICS);2023-11-24