HateCircle and Unsupervised Hate Speech Detection Incorporating Emotion and Contextual Semantics-Reference-Cited by-同舟云学术

HateCircle and Unsupervised Hate Speech Detection Incorporating Emotion and Contextual Semantics

Published:2023-03-24 Issue:4 Volume:22 Page:1-28
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Ghosal Sayani¹^ORCID,Jain Amita²^ORCID

Affiliation:

1. Netaji Subhas University of Technology East Campus (erstwhile A.I.A.C.T.R.), Guru Gobind Singh Indraprastha University, Delhi, India

2. Netaji Subhas University of Technology, Delhi, India

Abstract

The explosive growth of social media has fueled an extensive increase in online freedom of speech. The worldwide platform of human voice creates possibilities to assail other users without facing any consequences, and flout social etiquettes, resulting in an inevitable increase of hate speech. Nowadays, English hate speech detection is a popular research area, but the prevalence of implicit hate content in regional languages desire effective language-independent models. The proposed research is the first unsupervised Hindi and Bengali hate content detection framework consisting of three significant concepts: HateCircle, hate tweet classification, and code-switch data preparation algorithms. The novel HateCircle method is proposed to detect hate orientation for each term by co-occurrence patterns of words, contextual semantics, and emotion analysis. The efficient multiclass hate tweet classification algorithm is proposed with parts of speech tagging, Euclidean distance, and the Geometric median methods. The detection of hate content is more efficient in the native script compared to the Roman script, so the transliteration algorithm is also proposed for code-switch data preparation. The experimentation evaluates the combination of various lexicons with our enriched hate lexicon that achieves a maximum of 0.74 F1-score for the Hindi and 0.88 F1-score for the Bengali datasets. The novel HateCircle and hate tweet detection framework evaluates with our proposed parts of speech tagging and Geometric median detection methods. Results reveal that HateCircle and hate tweet detection framework also achieves a maximum of 0.73 accuracy for the Hindi and 0.78 accuracy for the Bengali dataset. The experiment results signify that contextual semantic hate speech detection research with a language-independency feature offsets the growth of implicit abusive text in social media.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3576913

Reference61 articles.

1. Twitter Revenue and Usage Statistics. 2022. BusinessofApps. Retrieved January 11 2022 from https://www.businessofapps.com/data/twitter-statistics/.

2. Statista Research Department. 2022. Number of Data Removal Requests Issued to Twitter from July to December 2020 by Country and Institution. Statista. Retrieved July 2022 from https://www.statista.com/statistics/234858/number-of-requests-for-data-removal-from-twitter/.

3. A deep neural network based multi-task learning approach to hate speech detection

4. Wikipedia. 2022. List of Languages by Total Number of Speakers. Retrieved January 15 2022 from https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers.

5. A Survey on Automatic Detection of Hate Speech in Text

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Artificial Intelligence inspired method for cross-lingual cyberhate detection from low resource languages;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-08-16

2. HumourHindiNet: Humour detection in Hindi web series using word embedding and convolutional neural network;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-06-26

3. A survey on multi-lingual offensive language detection;PeerJ Computer Science;2024-03-29

4. Hate Speech Detection in Tweets using Support Vector Machine;2024 IEEE International Conference on Computing, Power and Communication Technologies (IC2PCT);2024-02-09

5. Negative Stances Detection from Multilingual Data Streams in Low-Resource Languages on Social Media Using BERT and CNN-Based Transfer Learning Model;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-01-15