Geo-Spatial Mapping of Hate Speech Prediction in Roman Urdu-Reference-Cited by-同舟云学术

Geo-Spatial Mapping of Hate Speech Prediction in Roman Urdu

Published:2023-02-14 Issue:4 Volume:11 Page:969
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Aziz Samia¹^ORCID,Sarfraz Muhammad Shahzad¹,Usman Muhammad¹^ORCID,Aftab Muhammad Umar¹^ORCID,Rauf Hafiz Tayyab²^ORCID

Affiliation:

1. Department of Computer Science, National University of Computer and Emerging Sciences, Islamabad, Chiniot-Faisalabad Campus, Chiniot 35400, Pakistan

2. Centre for Smart Systems, AI and Cybersecurity, Staffordshire University, Stoke-on-Trent ST4 2DE, UK

Abstract

Social media has transformed into a crucial channel for political expression. Twitter, especially, is a vital platform used to exchange political hate in Pakistan. Political hate speech affects the public image of politicians, targets their supporters, and hurts public sentiments. Hate speech is a controversial public speech that promotes violence toward a person or group based on specific characteristics. Although studies have been conducted to identify hate speech in European languages, Roman languages have yet to receive much attention. In this research work, we present the automatic detection of political hate speech in Roman Urdu. An exclusive political hate speech labeled dataset (RU-PHS) containing 5002 instances and city-level information has been developed. To overcome the vast lexical structure of Roman Urdu, we propose an algorithm for the lexical unification of Roman Urdu. Three vectorization techniques are developed: TF-IDF, word2vec, and fastText. A comparative analysis of the accuracy and time complexity of conventional machine learning models and fine-tuned neural networks using dense word representations is presented for classifying and predicting political hate speech. The results show that a random forest and the proposed feed-forward neural network achieve an accuracy of 93% using fastText word embedding to distinguish between neutral and politically offensive speech. The statistical information helps identify trends and patterns, and the hotspot and cluster analysis assist in pinpointing Punjab as a highly susceptible area in Pakistan in terms of political hate tweet generation.

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/11/4/969/pdf

Reference73 articles.

1. A lexicon-based approach for hate speech detection;Gitari;Int. J. Multimed. Ubiquitous Eng.,2015

2. Aslam, S. (2022, June 08). Twitter by the Numbers: Stats, Demographics & Fun Facts. Available online: https://www.omnicoreagency.com/twitter-statistics/.

3. Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., and Bhamidipati, N. (2015, January 18–22). Hate speech detection with comment embeddings. Proceedings of the 24th International Conference on World Wide Web, Florence, Italy.

4. Roman Urdu toxic comment classification;Saeed;Lang. Resour. Eval.,2021

5. Roman Urdu news headline classification empowered with machine learning;Naqvi;Comput. Mater. Contin.,2020

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hate Speech Detection in Roman Urdu using Machine Learning Techniques;2024 5th International Conference on Advancements in Computational Sciences (ICACS);2024-02-19

2. So-haTRed: A Novel Hybrid System for Turkish Hate Speech Detection in Social Media With Ensemble Deep Learning Improved by BERT and Clustered-Graph Networks;IEEE Access;2024