Policy-Based Spam Detection of Tweets Dataset-Reference-Cited by-同舟云学术

Policy-Based Spam Detection of Tweets Dataset

Published:2023-06-14 Issue:12 Volume:12 Page:2662
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Dar Momna¹,Iqbal Faiza¹^ORCID,Latif Rabia²^ORCID,Altaf Ayesha¹^ORCID,Jamail Nor Shahida Mohd²

Affiliation:

1. Department of Computer Science, University of Engineering and Technology, Lahore P.O. Box 54890, Pakistan

2. Artificial Intelligence and Data Analytics Laboratory, College of Computer and Information Sciences (CCIS), Prince Sultan University, Riyadh P.O. Box 66833, Saudi Arabia

Abstract

Spam communications from spam ads and social media platforms such as Facebook, Twitter, and Instagram are increasing, making spam detection more popular. Many languages are used for spam review identification, including Chinese, Urdu, Roman Urdu, English, Turkish, etc.; however, there are fewer high-quality datasets available for Urdu. This is mainly because Urdu is less extensively used on social media networks such as Twitter, making it harder to collect huge volumes of relevant data. This paper investigates policy-based Urdu tweet spam detection. This study aims to collect over 1,100,000 real-time tweets from multiple users. The dataset is carefully filtered to comply with Twitter’s 100-tweet-per-hour limit. For data collection, the snscrape library is utilized, which is equipped with an API for accessing various attributes such as username, URL, and tweet content. Then, a machine learning pipeline consisting of TF-IDF, Count Vectorizer, and the following machine learning classifiers: multinomial naïve Bayes, support vector classifier RBF, logical regression, and BERT, are developed. Based on Twitter policy standards, feature extraction is performed, and the dataset is separated into training and testing sets for spam analysis. Experimental results show that the logistic regression classifier has achieved the highest accuracy, with an F1-score of 0.70 and an accuracy of 99.55%. The findings of the study show the effectiveness of policy-based spam detection in Urdu tweets using machine learning and BERT layer models and contribute to the development of a robust Urdu language social media spam detection method.

Funder

Artificial Intelligence and Data Analytics Laboratory, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia

University of Engineering and Technology (UET), Lahore

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/12/2662/pdf

Reference44 articles.

1. Alorini, D., and Rawat, D.B. (2019, January 18–21). Automatic spam detection on gulf dialectical. Proceedings of the Conference on Computing, Networking and Communication, Honolulu, HI, USA.

2. Addressing the class imbalance problem in Twitter spam detection using ensemble learning;Liu;Comput. Secur.,2017

3. Wu, T., Liu, S., Zhang, J., and Xiang, Y. (2017, January 31). Twitter spam detection based on deep learning. Proceedings of the Australasian Computer Science Week Multiconference, Geelong, Australia.

4. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions;Alzubaidi;J. Big Data,2021

5. Improving spam email detection using deep recurrent neural network;Ghouzali;Inst. Adv. Eng. Sci.,2022

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Filtering and Detection of Real-Time Spam Mail Based on a Bayesian Approach in University Networks;Electronics;2024-01-16

2. Detecting Pragmatic Ambiguity in Requirement Specification Using Novel Concept Maximum Matching Approach Based on Graph Network;IEEE Access;2024

3. Detection of Phishing Domain Using Logistic Regression Technique and Feature Extraction Using BERT Classification Model;2023 3rd International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON);2023-12-29

4. Review Evaluation for Hotel Recommendation;Electronics;2023-11-16

5. Assessing Urdu Language Processing Tools via Statistical and Outlier Detection Methods on Urdu Tweets;ACM Transactions on Asian and Low-Resource Language Information Processing;2023-10-13