Abstract
Abstract
Though social media helps spread knowledge more effectively, it also stimulates the propagation of online abuse and harassment, including hate speech. It is crucial to prevent hate speech since it may have serious adverse effects on both society and individuals. Therefore, it is not only important for models to detect these speeches but to also output explanations of why a given text is toxic. While plenty of research is going on to detect online hate speech in English, there is very little research on low-resource languages like Hindi and the explainability aspect of hate speech. Recent laws like the “right to explanations” of the General Data Protection Regulation have spurred research in developing interpretable models rather than only focusing on performance. Motivated by this, we create the first interpretable benchmark hate speech corpus hate speech explanation (HHES) in the Hindi language, where each hate post has its stereotypical bias and target group category. Providing descriptions of internal stereotypical bias as an explanation of hate posts makes a hate speech detection model more trustworthy. Current work proposes a commonsense-aware unified generative framework, CGenEx, by reframing the multitask problem as a text-to-text generation task. The novelty of this framework is it can solve two different categories of tasks (generation and classification) simultaneously. We establish the efficacy of our proposed model (CGenEx-fuse) on various evaluation metrics over other baselines when applied to the Hindi HHES dataset.
Disclaimer
The article contains profanity, an inevitable situation for the nature of the work involved. These in no way reflect the opinion of authors.
Publisher
Cambridge University Press (CUP)
Reference51 articles.
1. Kamble, S. and Joshi, A. (2018). Hate speech detection from code-mixed Hindi-english Tweets using deep learning models, arXiv preprint arXiv: 1811.05145.
2. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
3. Kumar, R. , Reganti, A. N. , Bhatia, A. and Maheshwari, T. (2018). Aggression-annotated corpus of Hindi-english code-mixed data, arXiv preprint arXiv: 1803.09402.
4. Stereotypical Bias Removal for Hate Speech Detection Task using Knowledge-based Generalizations
5. Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection