An Evaluation of Multilingual Offensive Language Identification Methods for the Languages of India-Reference-Cited by-同舟云学术

An Evaluation of Multilingual Offensive Language Identification Methods for the Languages of India

Published:2021-07-29 Issue:8 Volume:12 Page:306
ISSN:2078-2489
Container-title:Information
language:en
Short-container-title:Information

Author:

Ranasinghe Tharindu^ORCID,Zampieri Marcos

Abstract

The pervasiveness of offensive content in social media has become an important reason for concern for online platforms. With the aim of improving online safety, a large number of studies applying computational models to identify such content have been published in the last few years, with promising results. The majority of these studies, however, deal with high-resource languages such as English due to the availability of datasets in these languages. Recent work has addressed offensive language identification from a low-resource perspective, exploring data augmentation strategies and trying to take advantage of existing multilingual pretrained models to cope with data scarcity in low-resource scenarios. In this work, we revisit the problem of low-resource offensive language identification by evaluating the performance of multilingual transformers in offensive language identification for languages spoken in India. We investigate languages from different families such as Indo-Aryan (e.g., Bengali, Hindi, and Urdu) and Dravidian (e.g., Tamil, Malayalam, and Kannada), creating important new technology for these languages. The results show that multilingual offensive language identification models perform better than monolingual models and that cross-lingual transformers show strong zero-shot and few-shot performance across languages.

Publisher

MDPI AG

Subject

Information Systems

Link

https://www.mdpi.com/2078-2489/12/8/306/pdf

Reference54 articles.

1. SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)

2. SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter

3. Predicting the Type and Target of Offensive Posts in Social Media

4. SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification

Cited by 21 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A survey on multi-lingual offensive language detection;PeerJ Computer Science;2024-03-29

2. Automatic Detection of Multilingual Misogynistic Content in Social Media Data Based on Machine Learning Approach;2024 International Conference on Integrated Circuits and Communication Systems (ICICACS);2024-02-23

3. Hate Speech Detection in Indian Languages: A Brief Survey;2023 IEEE 2nd International Conference on Data, Decision and Systems (ICDDS);2023-12-01

4. Offensive Sentiment Detection with Chat GPT and Other Transformers in Kannada;2023 IEEE 2nd International Conference on Data, Decision and Systems (ICDDS);2023-12-01

5. Transformer-based Models for Language Identification: A Comparative Study;2023 International Conference on System, Computation, Automation and Networking (ICSCAN);2023-11-17