Optimizing Machine Learning-based Sentiment Analysis Accuracy in Bilingual Sentences via Preprocessing Techniques-Reference-Cited by-同舟云学术

Optimizing Machine Learning-based Sentiment Analysis Accuracy in Bilingual Sentences via Preprocessing Techniques

Published:2024 Issue:2 Volume:21 Page:
ISSN:2309-4524
Container-title:The International Arab Journal of Information Technology
language:en
Short-container-title:IAJIT

Author:

Maree Mohammed,Eleyat Mujahed,Mesqali Enas

Abstract

With the recent advances in Natural Language Processing (NLP) technologies, the ability to process, analyze, and understand sentiments expressed in user-generated reviews regarding the products and services they use is becoming more achievable. Despite the latest improvements in this field, little attention has been given to multilingual sentiment analysis. In this article, a framework is presented for sentiment analysis in Arabic and English using two datasets (ASTD, AJGT) along with their translations. Preprocessing techniques, including n-gram tokenization, Arabic-specific stop words removal, punctuation removal, removing repeating characters, parts of speech tagging, stemming, and lemmatization, are applied. Four machine learning classifiers, namely Logistic Regression (LR), Random Forest (RF), Naive Bayes (NB), and Support Vector Machine (SVM), are employed. We highlight existing specialized research in sentiment analysis for Arabic and English, as well as the employed techniques in each. Furthermore, the impact of preprocessing on accuracy results for both Arabic and English languages is investigated through separate experiments for each step. Experimental results on the ASTD dataset demonstrate close performance across classifiers, with the SVM classifier achieving the highest accuracy of 70%. However, the accuracy varied when using the AJGT dataset, with the NB classifier yielding the best accuracy at approximately 87%. The experiments on the translated datasets from Arabic to English did not exhibit significant differences, although some features performed slightly better using the Arabic datasets.

Publisher

Zarqa University

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Narwhal Optimizer: A Novel Nature-Inspired Metaheuristic Algorithm;The International Arab Journal of Information Technology;2024

2. Analyzing Sentiments using Optimized Novel Ensemble Fuzzy and DL based Approach with Efficient Feature Selection and Extraction Models;The International Arab Journal of Information Technology;2024

3. RSO based Optimization of Random Forest Classifier for Fault Detection and Classification in Photovoltaic Arrays;The International Arab Journal of Information Technology;2024

4. An Improved Classification Model for English Syntax Error Correction Design of DL Algorithm;The International Arab Journal of Information Technology;2024