SSL-GAN-RoBERTa: A robust semi-supervised model for detecting Anti-Asian COVID-19 hate speech on social media-Reference-Cited by-同舟云学术

SSL-GAN-RoBERTa: A robust semi-supervised model for detecting Anti-Asian COVID-19 hate speech on social media

Published:2023-08-03 Issue: Volume: Page:1-20
ISSN:1351-3249
Container-title:Natural Language Engineering
language:en
Short-container-title:Nat. Lang. Eng.

Author:

Su Xuanyu,Li Yansong,Branco Paula^ORCID,Inkpen Diana^ORCID

Abstract

Abstract Anti-Asian speech during the COVID-19 pandemic has been a serious problem with severe consequences. A hate speech wave swept social media platforms. The timely detection of Anti-Asian COVID-19-related hate speech is of utmost importance, not only to allow the application of preventive mechanisms but also to anticipate and possibly prevent other similar discriminatory situations. In this paper, we address the problem of detecting Anti-Asian COVID-19-related hate speech from social media data. Previous approaches that tackled this problem used a transformer-based model, BERT/RoBERTa, trained on the homologous annotated dataset and achieved good performance on this task. However, this requires extensive and annotated datasets with a strong connection to the topic. Both goals are difficult to meet without employing reliable, vast, and costly resources. In this paper, we propose a robust semi-supervised model, SSL-GAN-RoBERTa, that learns from a limited heterogeneous dataset and whose performance is further enhanced by using vast amounts of unlabeled data from another related domain. Compared with the RoBERTa baseline model, the experimental results show that the model has substantial performance gains in terms of Accuracy and Macro-F1 score in different scenarios that use data from different domains. Our proposed model achieves state-of-the-art performance results while efficiently using unlabeled data, showing promising applicability to other complex classification tasks where large amounts of labeled examples are difficult to obtain.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software

Reference46 articles.

1. Hate speech detection on Twitter using transfer learning

2. Racial Bias in Hate Speech and Abusive Language Detection Datasets

3. Language models are unsupervised multitask learners;Radford;OpenAI Blog,2019

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Semi-Supervised Training for (Pre-Stack) Seismic Data Analysis;Applied Sciences;2024-05-15

2. Multilingual Hate Speech Detection: A Semi-Supervised Generative Adversarial Approach;Entropy;2024-04-18

3. Hate speech detection in social media: Techniques, recent trends, and future challenges;WIREs Computational Statistics;2024-03

4. Back Translation-EDA and Transformer for Hate Speech Classification in Indonesian;2023 International Conference on Informatics, Multimedia, Cyber and Informations System (ICIMCIS);2023-11-07