Author:
Fauzi M. Ali,Yuniarti Anny
Abstract
Due to the massive increase of user-generated web content, in particular on social media networks where anyone can give a statement freely without any limitations, the amount of hateful activities is also increasing. Social media and microblogging web services, such as Twitter, allowing to read and analyze user tweets in near real time. Twitter is a logical source of data for hate speech analysis since users of twitter are more likely to express their emotions of an event by posting some tweet. This analysis can help for early identification of hate speech so it can be prevented to be spread widely. The manual way of classifying out hateful contents in twitter is costly and not scalable. Therefore, the automatic way of hate speech detection is needed to be developed for tweets in Indonesian language. In this study, we used ensemble method for hate speech detection in Indonesian language. We employed five stand-alone classification algorithms, including Naïve Bayes, K-Nearest Neighbours, Maximum Entropy, Random Forest, and Support Vector Machines, and two ensemble methods, hard voting and soft voting, on Twitter hate speech dataset. The experiment results showed that using ensemble method can improve the classification performance. The best result is achieved when using soft voting with F1 measure 79.8% on unbalance dataset and 84.7% on balanced dataset. Although the improvement is not truly remarkable, using ensemble method can reduce the jeopardy of choosing a poor classifier to be used for detecting new tweets as hate speech or not.
Publisher
Institute of Advanced Engineering and Science
Subject
Electrical and Electronic Engineering,Control and Optimization,Computer Networks and Communications,Hardware and Architecture,Information Systems,Signal Processing
Cited by
26 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Hate Speech Classification in Indonesian Tweets Using TF-IDF and Data Augmentation;2024 International Conference on Green Energy, Computing and Sustainable Technology (GECOST);2024-01-17
2. Hate Speech Detection in Social Media Using Ensemble Method in Classifiers;Lecture Notes in Networks and Systems;2024
3. Cervical Cancer Prediction Using Machine Learning Techniques;Lecture Notes in Networks and Systems;2024
4. Ensemble Text Classification with TF-IDF Vectorization for Hate Speech Detection in Social Media;2023 International Conference on System, Computation, Automation and Networking (ICSCAN);2023-11-17
5. Hate Speech Recognition in Chilean Tweets;2023 42nd IEEE International Conference of the Chilean Computer Science Society (SCCC);2023-10-23