Affiliation:
1. Computer Engineering, Dokuz Eylul University, Izmir 35390, Turkey
2. Information Technology, Sulaimani University, Sulaymanyah 46001, Iraq
Abstract
Cyberbullying, a widespread issue in digital communication, involves using online platforms to harass or demean individuals. Addressing it effectively requires understanding its manifestations across different linguistic contexts. This study presents a novel approach to cyberbullying detection, exploring its manifestations in seven languages through two distinct research paradigms: monolingual and multilingual scenarios. The monolingual approach focuses on developing and testing detection models within a single language framework. In contrast, the multilingual approach, which has shown superior performance, integrates data from multiple languages to train a unified model. This innovative strategy aims to harness broader linguistic diversity and enhance the model’s generalizability. We utilized three computational models: SONAR[Formula: see text]DNN, MUSE[Formula: see text]CNN-BiLSTM, and XLM-RoBERTa, with the SONAR[Formula: see text]DNN architecture demonstrating the most effective performance. This model combines SONAR’s sentence-level embeddings with the nuanced understanding of DNN, making it particularly adept at handling the complex variations of cyberbullying across languages. Our results indicate that multilingual models perform better, particularly in languages with significant representation, such as Arabic and English. Our evaluation shows that our models consistently outperform the best-recorded results on seven diverse datasets, achieving superior performance in six. This significant achievement underscores the robustness of our approach and marks an essential advancement in cyberbullying detection.
Publisher
World Scientific Pub Co Pte Ltd