Author:
Udanor Collins,Anyanwu Chinatu C.
Abstract
Purpose
Hate speech in recent times has become a troubling development. It has different meanings to different people in different cultures. The anonymity and ubiquity of the social media provides a breeding ground for hate speech and makes combating it seems like a lost battle. However, what may constitute a hate speech in a cultural or religious neutral society may not be perceived as such in a polarized multi-cultural and multi-religious society like Nigeria. Defining hate speech, therefore, may be contextual. Hate speech in Nigeria may be perceived along ethnic, religious and political boundaries. The purpose of this paper is to check for the presence of hate speech in social media platforms like Twitter, and to what degree is hate speech permissible, if available? It also intends to find out what monitoring mechanisms the social media platforms like Facebook and Twitter have put in place to combat hate speech. Lexalytics is a term coined by the authors from the words lexical analytics for the purpose of opinion mining unstructured texts like tweets.
Design/methodology/approach
This research developed a Python software called polarized opinions sentiment analyzer (POSA), adopting an ego social network analytics technique in which an individual’s behavior is mined and described. POSA uses a customized Python N-Gram dictionary of local context-based terms that may be considered as hate terms. It then applied the Twitter API to stream tweets from popular and trending Nigerian Twitter handles in politics, ethnicity, religion, social activism, racism, etc., and filtered the tweets against the custom dictionary using unsupervised classification of the texts as either positive or negative sentiments. The outcome is visualized using tables, pie charts and word clouds. A similar implementation was also carried out using R-Studio codes and both results are compared and a t-test was applied to determine if there was a significant difference in the results. The research methodology can be classified as both qualitative and quantitative. Qualitative in terms of data classification, and quantitative in terms of being able to identify the results as either negative or positive from the computation of text to vector.
Findings
The findings from two sets of experiments on POSA and R are as follows: in the first experiment, the POSA software found that the Twitter handles analyzed contained between 33 and 55 percent hate contents, while the R results show hate contents ranging from 38 to 62 percent. Performing a t-test on both positive and negative scores for both POSA and R-studio, results reveal p-values of 0.389 and 0.289, respectively, on an α value of 0.05, implying that there is no significant difference in the results from POSA and R. During the second experiment performed on 11 local handles with 1,207 tweets, the authors deduce as follows: that the percentage of hate contents classified by POSA is 40 percent, while the percentage of hate contents classified by R is 51 percent. That the accuracy of hate speech classification predicted by POSA is 87 percent, while free speech is 86 percent. And the accuracy of hate speech classification predicted by R is 65 percent, while free speech is 74 percent. This study reveals that neither Twitter nor Facebook has an automated monitoring system for hate speech, and no benchmark is set to decide the level of hate contents allowed in a text. The monitoring is rather done by humans whose assessment is usually subjective and sometimes inconsistent.
Research limitations/implications
This study establishes the fact that hate speech is on the increase on social media. It also shows that hate mongers can actually be pinned down, with the contents of their messages. The POSA system can be used as a plug-in by Twitter to detect and stop hate speech on its platform. The study was limited to public Twitter handles only. N-grams are effective features for word-sense disambiguation, but when using N-grams, the feature vector could take on enormous proportions and in turn increasing sparsity of the feature vectors.
Practical implications
The findings of this study show that if urgent measures are not taken to combat hate speech there could be dare consequences, especially in highly polarized societies that are always heated up along religious and ethnic sentiments. On daily basis tempers are flaring in the social media over comments made by participants. This study has also demonstrated that it is possible to implement a technology that can track and terminate hate speech in a micro-blog like Twitter. This can also be extended to other social media platforms.
Social implications
This study will help to promote a more positive society, ensuring the social media is positively utilized to the benefit of mankind.
Originality/value
The findings can be used by social media companies to monitor user behaviors, and pin hate crimes to specific persons. Governments and law enforcement bodies can also use the POSA application to track down hate peddlers.
Subject
Library and Information Sciences,Information Systems
Reference92 articles.
1. Community member retrieval on social media using textual information,2018
2. A focused crawler for mining hate and extremism promoting videos on YouTube,2014
3. Network analysis tools,2015
4. Audience perception of hate speech and foul language in the social media in Nigeria: implications for morality and law;Academicus International Scientific Journal,2017
5. Albert, A.Y. (2018), “Notes on machine learning and A.I. generating N-grams from sentences python”, available at: www.albertauyeung.com/post/generating-ngrams-python/ (accessed June 3, 2018).
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献