Abstract
We address a challenging problem of identifying main sources of hate speech on Twitter. On one hand, we carefully annotate a large set of tweets for hate speech, and deploy advanced deep learning to produce high quality hate speech classification models. On the other hand, we create retweet networks, detect communities and monitor their evolution through time. This combined approach is applied to three years of Slovenian Twitter data. We report a number of interesting results. Hate speech is dominated by offensive tweets, related to political and ideological issues. The share of unacceptable tweets is moderately increasing with time, from the initial 20% to 30% by the end of 2020. Unacceptable tweets are retweeted significantly more often than acceptable tweets. About 60% of unacceptable tweets are produced by a single right-wing community of only moderate size. Institutional Twitter accounts and media accounts post significantly less unacceptable tweets than individual accounts. In fact, the main sources of unacceptable tweets are anonymous accounts, and accounts that were suspended or closed during the years 2018–2020.
Funder
Javna Agencija za Raziskovalno Dejavnost RS
European Union’s Rights, Equality and Citizenship Programme
Rights, Equality and Citizenship Programme
Publisher
Public Library of Science (PLoS)
Reference55 articles.
1. Bayer J, Bárd P. Hate speech and hate crime in the EU and the evaluation of online content regulation approaches. Directorate-General for Internal Policies, European Union; 2020. Available from: https://www.europarl.europa.eu/RegData/etudes/STUD/2020/655135/IPOL_STU(2020)655135_EN.pdf.
2. Hate speech detection: Challenges and solutions;S MacAvaney;PloS ONE,2019
3. Basile V, Bosco C, Fersini E, Debora N, Patti V, Pardo FMR, et al. Semeval-2019 Task 5: Multilingual detection of hate speech against immigrants and women in Twitter. In: Proc. 13th International Workshop on Semantic Evaluation. ACL; 2019. p. 54–63.
4. Zampieri M, Malmasi S, Nakov P, Rosenthal S, Farra N, Kumar R. SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval). In: Proc. 13th International Workshop on Semantic Evaluation. ACL; 2019. p. 75–86. Available from: https://www.aclweb.org/anthology/S19-2010.
5. Zampieri M, Nakov P, Rosenthal S, Atanasova P, Karadzhov G, Mubarak H, et al. SemEval-2020 Task 12: Multilingual offensive language identification in social media (OffensEval); 2020. Available from: https://arxiv.org/abs/2006.07235.
Cited by
19 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A survey on multi-lingual offensive language detection;PeerJ Computer Science;2024-03-29
2. Analysing the Spread of Toxicity on Twitter;Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD);2024-01-04
3. MLHS-CGCapNet: A Lightweight Model for Multilingual Hate Speech Detection;IEEE Access;2024
4. The Impact of Sentiment in Social Network Communication;Advances in Intelligent Systems and Computing;2024
5. The systemic impact of deplatforming on social media;PNAS Nexus;2023-10-25