Automatic Hate Speech Detection in English-Odia Code Mixed Social Media Data Using Machine Learning Techniques-Reference-Cited by-同舟云学术

Automatic Hate Speech Detection in English-Odia Code Mixed Social Media Data Using Machine Learning Techniques

Published:2021-09-15 Issue:18 Volume:11 Page:8575
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Mohapatra Sudhir Kumar^ORCID,Prasad Srinivas,Bebarta Dwiti Krishna^ORCID,Das Tapan Kumar,Srinivasan Kathiravan^ORCID,Hu Yuh-Chung^ORCID

Abstract

Hate speech on social media may spread quickly through online users and subsequently, may even escalate into local vile violence and heinous crimes. This paper proposes a hate speech detection model by means of machine learning and text mining feature extraction techniques. In this study, the authors collected the hate speech of English-Odia code mixed data from a Facebook public page and manually organized them into three classes. In order to build binary and ternary datasets, the data are further converted into binary classes. The modeling of hate speech employs the combination of a machine learning algorithm and features extraction. Support vector machine (SVM), naïve Bayes (NB) and random forest (RF) models were trained using the whole dataset, with the extracted feature based on word unigram, bigram, trigram, combined n-grams, term frequency-inverse document frequency (TF-IDF), combined n-grams weighted by TF-IDF and word2vec for both the datasets. Using the two datasets, we developed two kinds of models with each feature—binary models and ternary models. The models based on SVM with word2vec achieved better performance than the NB and RF models for both the binary and ternary categories. The result reveals that the ternary models achieved less confusion between hate and non-hate speech than the binary models.

Funder

Ministry of Science and Technology, Taiwan

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/11/18/8575/pdf

Reference44 articles.

1. Automated Classification of Evidence of Respect in the Communication through Twitter

2. Analysis of the Effectiveness of Promotion Strategies of Social Platforms for the Elderly with Different Levels of Digital Literacy

3. Analyzing the dynamics of communication in online social networks;De Choudhury,2010

Cited by 21 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Mapping the scientific knowledge and approaches to defining and measuring hate crime, hate speech, and hate incidents: A systematic review;Campbell Systematic Reviews;2024-04-28

2. A survey on multi-lingual offensive language detection;PeerJ Computer Science;2024-03-29

3. Electric bus arrival and charging station placement assessment using machine learning techniques;International Journal of Sustainable Engineering;2024-03-27

4. A feature fusion and detection approach using deep learning for sentimental analysis and offensive text detection from code-mix Malayalam language;Biomedical Signal Processing and Control;2024-03

5. Evaluating Machine Learning Models for Hate Speech Detection in ODIA Language;2024 1st International Conference on Cognitive, Green and Ubiquitous Computing (IC-CGU);2024-03-01