CHUNAV: Analyzing Hindi Hate Speech and Targeted Groups in Indian Election Discourse

Author:

Jafri Farhan Ahmad1ORCID,Rauniyar Kritesh2ORCID,Thapa Surendrabikram3ORCID,Siddiqui Mohammad Aman1ORCID,Khushi Matloob4ORCID,Naseem Usman5ORCID

Affiliation:

1. Jamia Millia Islamia, New Delhi, India

2. Delhi Technological University, Delhi, India

3. Department of Computer Science, Virginia Tech, Blacksburg, United States

4. Department of Computer Science, Brunel University London, London, United Kingdom of Great Britain and Northern Ireland

5. School of Computing, Macquarie University, Sydney, Australia

Abstract

In the ever-evolving landscape of online discourse and political dialogue, the rise of hate speech poses a significant challenge to maintaining a respectful and inclusive digital environment. The context becomes particularly complex when considering the Hindi language—a low-resource language with limited available data. To address this pressing concern, we introduce the CHUNAV dataset—a collection of 11,457 Hindi tweets gathered during assembly elections in various states. CHUNAV is purpose-built for hate speech categorization and the identification of target groups. The dataset is a valuable resource for exploring hate speech within the distinctive socio-political context of Indian elections. The tweets within CHUNAV have been meticulously categorized into “Hate” and “Non-Hate” labels, and further subdivided to pinpoint the specific targets of hate speech, including “Individual”, “Organization”, and “Community” labels (as shown in Figure 1). Furthermore, this paper presents multiple benchmark models for hate speech detection, along with an innovative ensemble and oversampling-based method. The paper also delves into the results of topic modeling, all aimed at effectively addressing hate speech and target identification in the Hindi language. This contribution seeks to advance the field of hate speech analysis and foster a safer and more inclusive online space within the distinctive realm of Indian Assembly Elections.

Publisher

Association for Computing Machinery (ACM)

Reference71 articles.

1. Jesse Ables Nathaniel Childers William Anderson Sudip Mittal Shahram Rahimi Ioana Banicescu and Maria Seale. 2024. Eclectic Rule Extraction for Explainability of Deep Neural Network based Intrusion Detection Systems. arXiv preprint arXiv:2401.10207(2024).

2. Explainable hybrid word representations for sentiment analysis of financial news

3. Hate speech detection in the Indonesian language: A dataset and preliminary study

4. Jacob Amedie. 2015. The impact of social media on society. (2015).

5. Muhammad Umair Arshad Raza Ali Mirza Omer Beg and Waseem Shahzad. 2023. UHated: hate speech detection in Urdu language using transfer learning. Language Resources and Evaluation(2023) 1–20.

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Using Explainable AI (XAI) for Identification of Subjectivity in Hate Speech Annotations for Low-Resource Languages;4th International Workshop on OPEN CHALLENGES IN ONLINE SOCIAL NETWORKS;2024-09-10

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3