Sentiment analysis of imbalanced Arabic data using sampling techniques and classification algorithms
-
Published:2024-02-01
Issue:1
Volume:13
Page:607-618
-
ISSN:2302-9285
-
Container-title:Bulletin of Electrical Engineering and Informatics
-
language:
-
Short-container-title:Bulletin EEI
Author:
Al-Khazaleh Maisa J.ORCID,
Alian MarwahORCID,
Jaradat Manar A.
Abstract
Sentiment analysis is a popular natural language processing task that recognizes the opinions or feelings of a piece of text. Microblogging platforms such as Twitter are a valuable resource for finding such people’s opinions. The majority of Arabic sentiment analysis studies indicated that the data utilized to train machine learning algorithms is balanced. In this paper, we investigated the impact of sampling techniques and classification algorithms on an imbalanced Arabic dataset about people’s perceptions of COVID-19, with the majority of opinions reflecting people’s fear and stress about the pandemic, and the minority reflecting the belief that the pandemic was a hoax. The experiments concentrated on analyzing the imbalanced learning of Arabic sentiments using over-sampling and under-sampling techniques on seven single machine learning algorithms and two common ensemble algorithms from the bagging and boosting families, respectively. Results show that resampling-based approaches can overcome the difficulty of an imbalanced dataset, and the use of over-sampled data leads to better performance than that of under-sampled data. The results also reveal that using oversampled data from synthetic minority over-sampling technique (SMOTE), borderline-SMOTE, or adaptive synthetic sampling with random forest classifier is the most effective in addressing this classification problem, with F1-score value of 0.99.
Publisher
Institute of Advanced Engineering and Science
Subject
Electrical and Electronic Engineering,Control and Optimization,Computer Networks and Communications,Hardware and Architecture,Instrumentation,Information Systems,Control and Systems Engineering,Computer Science (miscellaneous)
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献