A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service-Reference-Cited by-同舟云学术

A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service

Published:2024-02-18 Issue:2 Volume:8 Page:54-71
ISSN:2543-7844
Container-title:Półrocznik Językoznawczy Tertium
language:
Short-container-title:Tertium

Author:

Okulska Inez^ORCID,Kołos Anna^ORCID

Abstract

The dynamic increase in user-generated content on the web presents significant challenges in protecting Internet users from exposure to offensive material, such as cyberbullying and hate speech, while also minimizing the spread of wrongful conduct. However, designing automated detection models for such offensive content remains complex, particularly in languages with limited publicly available data. To address this issue, our research collaborates with the Wykop.pl web service to fine-tune a model using genuine content that has been banned by professional moderators. In this paper, we focus on the Polish language and discuss the notion of datasets and annotation frameworks, presenting our stylometric analysis of Wykop.pl content to identify morpho-syntactic structures that are commonly applied in cyberbullying and hate speech. By doing so, we contribute to the ongoing discussion on offensive language and hate speech in sociolinguistic studies, emphasizing the need to consider user-generated online content.

Publisher

Estonian Literary Museum Scholarly Press

Reference23 articles.

1. Adamczak-Krysztofowicz, Sylwia, Anna Szczepaniak-Kozak (2017) “A Disturbing View of Intercultural Communication: Findings of a Study into Hate Speech in Polish.” Linguistica Silesiana, 38; 285-310. https://doi.org/10.24425/linsi.2017.117055.

2. Adamczak-Krysztofowicz, Sylwia, Anna Szczepaniak-Kozak, Magdalena Jaszczyk (2016) “Hate Speech: an Attempt to Disperse Terminological Ambiguities.” Voci 13; 13–28.

3. Banko, Michele, Brendon MacKeen, Laurie Ray (2020) “A Unified Taxonomy of Harmful Content.” [In:] Seyi Akiwowo, Bertie Vidgen, Vinodkumar Prabhakaran, Zeerak Waseem (eds.) Proceedings of the Fourth Workshop on Online Abuse and Harms; 125–137. Retrieved from https://aclanthology.org/volumes/2020.alw-1/. Date: 11.02.2024.

4. Davidson, Thomas, Dana Warmsley, Michael Macy, Ingmar Weber (2017) “Automated Hate Speech Detection and the Problem of Offensive Language”. [In:] Yu-Ru Lin, Meeyoung Cha, Daniele Quercia (eds.) Proceedings of the Eleventh International AAAI Conference on Web and Social Media (ICWSM 2017). Palo Alto, California: AAAI Press; 512–515.

5. De Gibert, Ona, Naiara Perez, Aitor García-Pablos, Montse Cuadros (2018) “Hate Speech Dataset from a White Supremacy Forum”. [In:] Darja Fišer, Ruihong Huang, Vinodkumar Prabhakaran, Rob Voigt, Zeerak Waseem, Jacqueline Wernimont (eds.) Proceedings of the Second Workshop on Abusive Language Online (ALW2); 11–20. Retrieved from https://aclanthology.org/volumes/W18-51/ Date: 11.02.2024.