A deep CNN architecture with novel pooling layer applied to two Sudanese Arabic sentiment data sets-Reference-Cited by-同舟云学术

A deep CNN architecture with novel pooling layer applied to two Sudanese Arabic sentiment data sets

Published:2023-10-21 Issue: Volume: Page:
ISSN:0165-5515
Container-title:Journal of Information Science
language:en
Short-container-title:Journal of Information Science

Author:

Mhamed Mustafa¹^ORCID,Sutcliffe Richard²^ORCID,Quteineh Husam³,Sun Xia⁴,Almekhlafi Eiad⁴^ORCID,Retta Ephrem Afele⁴,Feng Jun⁴

Affiliation:

1. School of Information Science and Technology, Northwest University, China; College of Information and Electrical Engineering, China Agricultural University, China

2. School of Information Science and Technology, Northwest University, China; School of Computer Science and Electronic Engineering, University of Essex, UK

3. Business and Local Government Data Research Centre, School of CSEE, University of Essex, UK

4. School of Information Science and Technology, Northwest University, China

Abstract

Arabic sentiment analysis has become an important research field in recent years. Initially, work focused on Modern Standard Arabic (MSA), which is the most widely used form. Since then, work has been carried out on several different dialects, including Egyptian, Levantine and Moroccan. Moreover, a number of data sets have been created to support such work. However, up until now, no work has been carried out on Sudanese Arabic, a dialect which has 32 million speakers. In this article, two new public data sets are introduced, the two-class Sudanese Sentiment Data set (SudSenti2) and the three-class Sudanese Sentiment Data set (SudSenti3). In the preparation phase, we establish a Sudanese stopword list. Furthermore, a convolutional neural network (CNN) architecture, Sentiment Convolutional MMA (SCM), is proposed, comprising five CNN layers together with a novel Mean Max Average (MMA) pooling layer, to extract the best features. This SCM model is applied to SudSenti2 and SudSenti3 and shown to be superior to the baseline models, with accuracies of 92.25% and 85.23% (Experiments 1 and 2). The performance of MMA is compared with Max, Avg and Min and shown to be better on SudSenti2, the Saudi Sentiment Data set and the MSA Hotel Arabic Review Data set by 1.00%, 0.83% and 0.74%, respectively (Experiment 3). Next, we conduct an ablation study to determine the contribution to performance of text normalisation and the Sudanese stopword list (Experiment 4). For normalisation, this makes a difference of 0.43% on two-class and 0.45% on three-class. For the custom stoplist, the differences are 0.82% and 0.72%, respectively. Finally, the model is compared with other deep learning classifiers, including transformer-based language models for Arabic, and shown to be comparable for SudSenti2 (Experiment 5).

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

Link

http://journals.sagepub.com/doi/pdf/10.1177/01655515231188341

Reference58 articles.

1. Many Facets of Sentiment Analysis

2. A classification benchmark for Arabic alphabet phonemes with diacritics in deep neural networks

3. Restoration of Arabic Diacritics Using a Multilevel Statistical Model

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Natural Language Processing for Arabic Sentiment Analysis: A Systematic Literature Review;IEEE Transactions on Big Data;2024-10

2. Sentiment Analysis of Arabic Dialects: A Review Study;Communications in Computer and Information Science;2024

3. Exploring the Sustainable Development Path of College Volunteerism with Voluntarism in the Context of Deep Learning;Applied Mathematics and Nonlinear Sciences;2023-12-16

4. Ensemble Stacking Model for Sentiment Analysis of Emirati and Arabic Dialects;Journal of King Saud University - Computer and Information Sciences;2023-09