Arabic Spam Tweets Classification: A Comprehensive Machine Learning Approach-Reference-Cited by-同舟云学术

Arabic Spam Tweets Classification: A Comprehensive Machine Learning Approach

Published:2024-07-02 Issue:3 Volume:5 Page:1049-1065
ISSN:2673-2688
Container-title:AI
language:en
Short-container-title:AI

Author:

Hantom Wafa Hussain¹,Rahman Atta¹^ORCID

Affiliation:

1. Department of Computer Science (CS), College of Computer Science and Information Technology (CCSIT), Imam Abdulrahman Bin Faisal University (IAU), P.O. Box 1982, Dammam 31441, Saudi Arabia

Abstract

Nowadays, one of the most common problems faced by Twitter (also known as X) users, including individuals as well as organizations, is dealing with spam tweets. The problem continues to proliferate due to the increasing popularity and number of users of social media platforms. Due to this overwhelming interest, spammers can post texts, images, and videos containing suspicious links that can be used to spread viruses, rumors, negative marketing, and sarcasm, and potentially hack the user’s information. Spam detection is among the hottest research areas in natural language processing (NLP) and cybersecurity. Several studies have been conducted in this regard, but they mainly focus on the English language. However, Arabic tweet spam detection still has a long way to go, especially emphasizing the diverse dialects other than modern standard Arabic (MSA), since, in the tweets, the standard dialect is seldom used. The situation demands an automated, robust, and efficient Arabic spam tweet detection approach. To address the issue, in this research, various machine learning and deep learning models have been investigated to detect spam tweets in Arabic, including Random Forest (RF), Support Vector Machine (SVM), Naive Bayes (NB) and Long-Short Term Memory (LSTM). In this regard, we have focused on the words as well as the meaning of the tweet text. Upon several experiments, the proposed models have produced promising results in contrast to the previous approaches for the same and diverse datasets. The results showed that the RF classifier achieved 96.78% and the LSTM classifier achieved 94.56%, followed by the SVM classifier that achieved 82% accuracy. Further, in terms of F1-score, there is an improvement of 21.38%, 19.16% and 5.2% using RF, LSTM and SVM classifiers compared to the schemes with same dataset.

Publisher

MDPI AG

Link

https://www.mdpi.com/2673-2688/5/3/52/pdf

Reference50 articles.

1. A Neuro-fuzzy approach for user behaviour classification and prediction;Dash;J. Cloud Comp.,2019

2. Decision Support System Assisted E-Recruiting System;Alqahtani;J. Comput. Theor.Nanosci.,2019

3. Sajid, N.A., Rahman, A., Ahmad, M., Musleh, D., Basheer Ahmed, M.I., Alassaf, R., Chabani, S., Ahmed, M.S., Salam, A.A., and AlKhulaifi, D. (2023). Single vs. Multi-Label: The Issues, Challenges and Insights of Contemporary Classification Schemes. Appl. Sci., 13.

4. User Behaviour Classification and Prediction Using Fuzzy Rule Based System and Linear Regression;Rahman;J. Inf. Assur. Secur.,2017

5. Click fraud detection for online advertising using machine learning;Aljabri;Egypt. Inform. J.,2023