Addressing religious hate online: from taxonomy creation to automated detection-Reference-Cited by-同舟云学术

Addressing religious hate online: from taxonomy creation to automated detection

Published:2022-12-15 Issue: Volume:8 Page:e1128
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Ramponi Alan¹,Testa Benedetta²,Tonelli Sara¹,Jezek Elisabetta²

Affiliation:

1. Fondazione Bruno Kessler, Trento, Italy

2. Dipartimento di Studi Umanistici, Università di Pavia, Pavia, Italy

Abstract

Abusive language in online social media is a pervasive and harmful phenomenon which calls for automatic computational approaches to be successfully contained. Previous studies have introduced corpora and natural language processing approaches for specific kinds of online abuse, mainly focusing on misogyny and racism. A current underexplored area in this context is religious hate, for which efforts in data and methods to date have been rather scattered. This is exacerbated by different annotation schemes that available datasets use, which inevitably lead to poor repurposing of data in wider contexts. Furthermore, religious hate is very much dependent on country-specific factors, including the presence and visibility of religious minorities, societal issues, historical background, and current political decisions. Motivated by the lack of annotated data specifically tailoring religion and the poor interoperability of current datasets, in this article we propose a fine-grained labeling scheme for religious hate speech detection. Such scheme lies on a wider and highly-interoperable taxonomy of abusive language, and covers the three main monotheistic religions: Judaism, Christianity and Islam. Moreover, we introduce a Twitter dataset in two languages—English and Italian—that has been annotated following the proposed annotation scheme. We experiment with several classification algorithms on the annotated dataset, from traditional machine learning classifiers to recent transformer-based language models, assessing the difficulty of two tasks: abusive language detection and religious hate speech detection. Finally, we investigate the cross-lingual transferability of multilingual models on the tasks, shedding light on the viability of repurposing our dataset for religious hate speech detection on low-resource languages. We release the annotated data and publicly distribute the code for our classification experiments at https://github.com/dhfbk/religious-hate-speech.

Funder

PROTECTOR European project

Publisher

PeerJ

Subject

General Computer Science

Link

https://peerj.com/articles/cs-1128.pdf

Reference89 articles.

1. Are they our brothers? Analysis and detection of religious hate speech in the Arabic Twittersphere;Albadi,2018

2. Automatic identification and classification of misogynistic language on Twitter;Anzovino,2018

3. Truth is a lie: crowd truth and the seven myths of human annotation;Aroyo;AI Magazine,2015

4. Cyber-extremism: ISIS and the power of social media;Awan;Society,2017

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Text data augmentation and pre-trained Language Model for enhancing text classification of low-resource languages;PeerJ Computer Science;2024-03-29

2. Special issue on analysis and mining of social media data;PeerJ Computer Science;2024-02-29

3. The Semiotics of Xenophobia and Misogyny on Digital Media;Advances in Media, Entertainment, and the Arts;2023-06-30