Affiliation:
1. Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, Norrbotten, Sweden
Abstract
Hate speech is a burning issue of today’s society that cuts across numerous strategic areas, including human rights protection, refugee protection, and the fight against racism and discrimination. The gravity of the subject is further demonstrated by António Guterres, the United Nations Secretary-General, calling it “a menace to democratic values, social stability, and peace”. One central platform for the spread of hate speech is the Internet and social media in particular. Thus, automatic detection of hateful and offensive content on these platforms is a crucial challenge that would strongly contribute to an equal and sustainable society when overcome. One significant difficulty in meeting this challenge is collecting sufficient labeled data. In our work, we examine how various resources can be leveraged to circumvent this difficulty. We carry out extensive experiments to exploit various data sources using different machine learning models, including state-of-the-art transformers. We have found that using our proposed methods, one can attain state-of-the-art performance detecting hate speech on Twitter (outperforming the winner of both the HASOC 2019 and HASOC 2020 competitions). It is observed that in general, adding more data improves the performance or does not decrease it. Even when using good language models and knowledge transfer mechanisms, the best results were attained using data from one or two additional data sets.
Reference102 articles.
1. A survey on data-efficient algorithms in big data era;Adadi;Journal of Big Data,2021
2. The legal regulation of hate speech: The international and European frameworks;Alkiviadou;Politička misao,2018
3. Hate speech on social media networks: Towards a regulatory framework?;Alkiviadou;Information & Communications Technology Law,2019
4. Social media and fake news in the 2016 election;Allcott;Journal of Economic Perspectives,2017
5. P. Alonso, R. Saini and G. Kovács, TheNorth at HASOC 2019 hate speech detection in social media data, in: Proceedings of the 11th Annual Meeting of the Forum for Information Retrieval Evaluation, 2019.