Inculcating Context for Emoji Powered Bengali Hate Speech Detection using Extended Fuzzy SVM and Text Embedding Models

Author:

Ghosal Sayani1ORCID,Jain Amita2ORCID,Tayal Devendra Kumar3ORCID,Menon Varun G.4ORCID,Kumar Akshi5ORCID

Affiliation:

1. Netaji Subhas University of Technology East Campus (erstwhile A.I.A.C.T.R.), Guru Gobind Singh Indraprastha University, Delhi, India

2. Netaji Subhas University of Technology, Delhi, India

3. Indira Gandhi Delhi Technical University for Women, Delhi, India

4. SCMS School of Engineering and Technology, Kerala, India

5. Manchester Metropolitan University, Manchester, UK

Abstract

The massive growth of social webs offer opportunities to communicate with diverse languages, unstructured text, informal posts, misspelled contents and emojis. Social media users feel comfortable to express their emotions specially emotions with high intensity (hate speech) in their mother tongue. Hate speech in any form targets groups and individuals that may trigger antisocial activities, hate crimes, and terrorist acts. Bengali social media users use Bengali for posting implicit or indirect hate text. Existing Bengali hate speech detection research considers explicit hate speech detection but in actual hate is expressed more in implicit way. In order to detect both implicit and explicit hate speech from low resource content, social webs need highly efficient automated tools. Researchers applied discriminative learning approaches (i.e. SVM, MLP, CNN) to distinguish hate text with only clear-cut outcomes in detecting direct hate speech. The proposed novel Bengali hate speech detection model considers two parallel approaches: (i) It applies extended fuzzy SVM classifier for class imbalanced dataset (FSVMCIL) and multilingual BERT (mBERT) text embedding model to detect first hate label; (ii) Morphological analysis method to detect implicit and explicit hate content with the hate similarity (HS) scheme for second hate label. Linking both labeling methods, this research extracts contextual Bengali hate speech from informal text. This novel HS method considers Word2Vec word embedding model and Bengali hate lexicon. It also considers emoji to text conversion for efficient contextual analysis. This study also conducts extensive experiments for various categories with the Bengali hate speech dataset. It also evaluates the proposed model performance considering weighted F1 score, precision, recall and accuracy parameters. Results reveal significant improvement in Bengali hate speech detection with 2.35% increase in F1- score and 9.11 % increase in accuracy.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference47 articles.

1. Statista Research Department . 2022. Number of monthly active Facebook users worldwide as of 2nd quarter 2022 . Statista. Retrieved 29 July 2022 from https://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/ Statista Research Department. 2022. Number of monthly active Facebook users worldwide as of 2nd quarter 2022. Statista. Retrieved 29 July 2022 from https://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/

2. Quantum Marketer . 2022. How Many People Use Twitter in 2022? (Twitter Statistics). Retrieved 29 July 2022 from https://quantummarketer.com/twitter-statistics/ Quantum Marketer. 2022. How Many People Use Twitter in 2022? (Twitter Statistics). Retrieved 29 July 2022 from https://quantummarketer.com/twitter-statistics/

3. Sayani Ghosal and Amita Jain. 2021. Research Journey of Hate Content Detection From Cyberspace. In Natural Language Processing for Global and Local Business (pp. 200-225). IGI Global. https:/doi.org/10.4018/978-1-7998-4240-8.ch009 10.4018/978-1-7998-4240-8.ch009

4. Sayani Ghosal and Amita Jain. 2021. Research Journey of Hate Content Detection From Cyberspace. In Natural Language Processing for Global and Local Business (pp. 200-225). IGI Global. https:/doi.org/10.4018/978-1-7998-4240-8.ch009

5. Light. 2022. Rising Levels of Hate Speech & Online Toxicity During This Time of Crisis. Retrieved 25 July 2022 from https://l1ght.com/Toxicity_during_coronavirus_Report-L1ght.pdf Light. 2022. Rising Levels of Hate Speech & Online Toxicity During This Time of Crisis. Retrieved 25 July 2022 from https://l1ght.com/Toxicity_during_coronavirus_Report-L1ght.pdf

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3