Hate speech detection in the Bengali language: a comprehensive survey-Reference-Cited by-同舟云学术

Hate speech detection in the Bengali language: a comprehensive survey

Published:2024-07-23 Issue:1 Volume:11 Page:
ISSN:2196-1115
Container-title:Journal of Big Data
language:en
Short-container-title:J Big Data

Author:

Al Maruf Abdullah,Abidin Ahmad Jainul,Haque Md. Mahmudul,Jiyad Zakaria Masud,Golder Aditi,Alubady Raaid,Aung Zeyar

Abstract

AbstractThe detection of hate speech (HS) in online platforms has become extremely important for maintaining a safe and inclusive environment. While significant progress has been made in English-language HS detection, methods for detecting HS in other languages, such as Bengali, have not been explored much like English. In this survey, we outlined the key challenges specific to HS detection in Bengali, including the scarcity of labeled datasets, linguistic nuances, and contextual variations. We also examined different approaches and methodologies employed by researchers to address these challenges, including classical machine learning techniques, ensemble approaches, and more recent deep learning advancements. Furthermore, we explored the performance metrics used for evaluation, including the accuracy, precision, recall, receiver operating characteristic (ROC) curve, area under the ROC curve (AUC), sensitivity, specificity, and F1 score, providing insights into the effectiveness of the proposed models. Additionally, we identified the limitations and future directions of research in Bengali HS detection, highlighting the need for larger annotated datasets, cross-lingual transfer learning techniques, and the incorporation of contextual information to improve the detection accuracy. This survey provides a comprehensive overview of the current state-of-the-art HS detection methods used in Bengali text and serves as a valuable resource for researchers and practitioners interested in understanding the advancements, challenges, and opportunities in addressing HS in the Bengali language, ultimately assisting in the creation of reliable and effective online platform detection systems.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s40537-024-00956-z.pdf

Reference170 articles.

1. Dhar S, Bose I. Empirical study of social capital factors formed through digital social networking, in Proceedings of the 2019 International Conference on Information Systems (ICIS), 2019:2983.

2. Mridha MF, Wadud MAH, Hamid MA, Monowar MM, Abdullah-Al-Wadud M, Alamri A. L-Boost: identifying offensive texts from social media post in Bengali. IEEE Access. 2021;9:164681–99.

3. Nobata C, Tetreault J, Thomas A, Mehdad Y, Chang Y. Abusive language detection in online user content, in Proceedings of the 25th International Conference on World Wide Web (WWW), 2016:145–153.

4. Sharif O, Hoque MM. Identification and classification of textual aggression in social media: Resource creation and evaluation, in Combating Online Hostile Posts in Regional Languages during Emergency Situation: First International Workshop, CONSTRAINT 2021, Collocated with AAAI 2021, Virtual Event, February 8, 2021, Revised Selected Papers 1, pp. 9–20, Springer, 2021.

5. Lucky EAE, Sany MMH, Keya M, Khushbu SA, Noori SRH. An attention on sentiment analysis of child abusive public comments towards Bangla text and ML, in Proceedings of the 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1–6, IEEE, 2021.