THAR- Targeted Hate Speech Against Religion: A high-quality Hindi-English code-mixed Dataset with the Application of Deep Learning Models for Automatic Detection-Reference-Cited by-同舟云学术

THAR- Targeted Hate Speech Against Religion: A high-quality Hindi-English code-mixed Dataset with the Application of Deep Learning Models for Automatic Detection

Published:2024-03-18 Issue: Volume: Page:
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Sharma Deepawali¹^ORCID,Singh Aakash¹^ORCID,Singh Vivek Kumar²^ORCID

Affiliation:

1. Department of Computer Science, Banaras Hindu University, Varanasi, India

2. Department of Computer Science, University of Delhi, Delhi, India

Abstract

During the last decade, social media has gained significant popularity as a medium for individuals to express their views on various topics. However, some individuals also exploit the social media platforms to spread hatred through their comments and posts, some of which target individuals, communities or religions. Given the deep emotional connections people have to their religious beliefs, this form of hate speech can be divisive and harmful, and may result in issues of mental health as social disorder. Therefore, there is a need of algorithmic approaches for the automatic detection of instances of hate speech. Most of the existing studies in this area focus on social media content in English, and as a result several low-resource languages lack computational resources for the task. This study attempts to address this research gap by providing a high-quality annotated dataset designed specifically for identifying hate speech against religions in the Hindi-English code-mixed language. This dataset “Targeted Hate Speech Against Religion” (THAR)) consists of 11,549 comments and has been annotated by five independent annotators. It comprises two subtasks: (i) Subtask-1 (Binary classification), (ii) Subtask-2 (multi-class classification). To ensure the quality of annotation, the Fleiss Kappa measure has been employed. The suitability of the dataset is then further explored by applying different standard deep learning, and transformer-based models. The transformer-based model, namely Multilingual Representations for Indian Languages (MuRIL), is found to outperform the other implemented models in both subtasks, achieving macro average and weighted average F1 scores of 0.78 and 0.78 for Subtask-1, and 0.65 and 0.72 for Subtask-2, respectively. The experimental results obtained not only confirm the suitability of the dataset but also advance the research towards automatic detection of hate speech, particularly in the low-resource Hindi-English code-mixed language.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3653017

Reference50 articles.

1. Schultz, P. W., Nolan, J. M., Cialdini, R. B., Goldstein, N. J., & Griskevicius, V. (2018). The constructive, destructive, and reconstructive power of social norms: Reprise. Perspectives on psychological science, 13(2), 249-254.

2. Akram, W., & Kumar, R. (2017). A study on positive and negative effects of social media on society. International journal of computer sciences and engineering, 5(10), 351-354.

3. Singh, A., Kanaujia, A., Singh, V. K., & Vinuesa, R. (2023). Artificial intelligence for Sustainable Development Goals: Bibliometric patterns and concept evolution trajectories. Sustainable Development

4. Research on Sustainable Development Goals: How has Indian Scientific Community Responded?;Singh A.;Journal of Scientific & Industrial Research,2022

5. Del Vigna12, F., Cimino23, A., Dell'Orletta, F., Petrocchi, M., & Tesconi, M. (2017, January). Hate me, hate me not: Hate speech detection on facebook. In Proceedings of the first Italian conference on cybersecurity (ITASEC17) (pp. 86-95).

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Misogynistic attitude detection in YouTube comments and replies: A high-quality dataset and algorithmic models;Computer Speech & Language;2025-01

2. Using Explainable AI (XAI) for Identification of Subjectivity in Hate Speech Annotations for Low-Resource Languages;4th International Workshop on OPEN CHALLENGES IN ONLINE SOCIAL NETWORKS;2024-09-10

3. Should we stay silent on violence? An ensemble approach to detect violent incidents in Spanish social media texts;Natural Language Processing;2024-09-06

4. MIMIC: Misogyny Identification in Multimodal Internet Content in Hindi-English Code-Mixed Language;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-04-04