Inculcating Context for Emoji Powered Bengali Hate Speech Detection using Extended Fuzzy SVM and Text Embedding Models-Reference-Cited by-同舟云学术

Inculcating Context for Emoji Powered Bengali Hate Speech Detection using Extended Fuzzy SVM and Text Embedding Models

Published:2023-03-27 Issue: Volume: Page:
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Ghosal Sayani¹^ORCID,Jain Amita²^ORCID,Tayal Devendra Kumar³^ORCID,Menon Varun G.⁴^ORCID,Kumar Akshi⁵^ORCID

Affiliation:

1. Netaji Subhas University of Technology East Campus (erstwhile A.I.A.C.T.R.), Guru Gobind Singh Indraprastha University, Delhi, India

2. Netaji Subhas University of Technology, Delhi, India

3. Indira Gandhi Delhi Technical University for Women, Delhi, India

4. SCMS School of Engineering and Technology, Kerala, India

5. Manchester Metropolitan University, Manchester, UK

Abstract

The massive growth of social webs offer opportunities to communicate with diverse languages, unstructured text, informal posts, misspelled contents and emojis. Social media users feel comfortable to express their emotions specially emotions with high intensity (hate speech) in their mother tongue. Hate speech in any form targets groups and individuals that may trigger antisocial activities, hate crimes, and terrorist acts. Bengali social media users use Bengali for posting implicit or indirect hate text. Existing Bengali hate speech detection research considers explicit hate speech detection but in actual hate is expressed more in implicit way. In order to detect both implicit and explicit hate speech from low resource content, social webs need highly efficient automated tools. Researchers applied discriminative learning approaches (i.e. SVM, MLP, CNN) to distinguish hate text with only clear-cut outcomes in detecting direct hate speech. The proposed novel Bengali hate speech detection model considers two parallel approaches: (i) It applies extended fuzzy SVM classifier for class imbalanced dataset (FSVMCIL) and multilingual BERT (mBERT) text embedding model to detect first hate label; (ii) Morphological analysis method to detect implicit and explicit hate content with the hate similarity (HS) scheme for second hate label. Linking both labeling methods, this research extracts contextual Bengali hate speech from informal text. This novel HS method considers Word2Vec word embedding model and Bengali hate lexicon. It also considers emoji to text conversion for efficient contextual analysis. This study also conducts extensive experiments for various categories with the Bengali hate speech dataset. It also evaluates the proposed model performance considering weighted F1 score, precision, recall and accuracy parameters. Results reveal significant improvement in Bengali hate speech detection with 2.35% increase in F1- score and 9.11 % increase in accuracy.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3589001

Reference47 articles.

1. Statista Research Department . 2022. Number of monthly active Facebook users worldwide as of 2nd quarter 2022 . Statista. Retrieved 29 July 2022 from https://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/ Statista Research Department. 2022. Number of monthly active Facebook users worldwide as of 2nd quarter 2022. Statista. Retrieved 29 July 2022 from https://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/

2. Quantum Marketer . 2022. How Many People Use Twitter in 2022? (Twitter Statistics). Retrieved 29 July 2022 from https://quantummarketer.com/twitter-statistics/ Quantum Marketer. 2022. How Many People Use Twitter in 2022? (Twitter Statistics). Retrieved 29 July 2022 from https://quantummarketer.com/twitter-statistics/

3. Sayani Ghosal and Amita Jain. 2021. Research Journey of Hate Content Detection From Cyberspace. In Natural Language Processing for Global and Local Business (pp. 200-225). IGI Global. https:/doi.org/10.4018/978-1-7998-4240-8.ch009 10.4018/978-1-7998-4240-8.ch009

4. Sayani Ghosal and Amita Jain. 2021. Research Journey of Hate Content Detection From Cyberspace. In Natural Language Processing for Global and Local Business (pp. 200-225). IGI Global. https:/doi.org/10.4018/978-1-7998-4240-8.ch009

5. Light. 2022. Rising Levels of Hate Speech & Online Toxicity During This Time of Crisis. Retrieved 25 July 2022 from https://l1ght.com/Toxicity_during_coronavirus_Report-L1ght.pdf Light. 2022. Rising Levels of Hate Speech & Online Toxicity During This Time of Crisis. Retrieved 25 July 2022 from https://l1ght.com/Toxicity_during_coronavirus_Report-L1ght.pdf

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A transformer-based generative adversarial learning to detect sarcasm from Bengali text with correct classification of confusing text;Heliyon;2023-12

2. User-aware multilingual abusive content detection in social media;Information Processing & Management;2023-09

3. Natural Language Processing in Politics;Artificial Intelligence, Game Theory and Mechanism Design in Politics;2023