A deep CNN architecture with novel pooling layer applied to two Sudanese Arabic sentiment data sets

Author:

Mhamed Mustafa1ORCID,Sutcliffe Richard2ORCID,Quteineh Husam3,Sun Xia4,Almekhlafi Eiad4ORCID,Retta Ephrem Afele4,Feng Jun4

Affiliation:

1. School of Information Science and Technology, Northwest University, China; College of Information and Electrical Engineering, China Agricultural University, China

2. School of Information Science and Technology, Northwest University, China; School of Computer Science and Electronic Engineering, University of Essex, UK

3. Business and Local Government Data Research Centre, School of CSEE, University of Essex, UK

4. School of Information Science and Technology, Northwest University, China

Abstract

Arabic sentiment analysis has become an important research field in recent years. Initially, work focused on Modern Standard Arabic (MSA), which is the most widely used form. Since then, work has been carried out on several different dialects, including Egyptian, Levantine and Moroccan. Moreover, a number of data sets have been created to support such work. However, up until now, no work has been carried out on Sudanese Arabic, a dialect which has 32 million speakers. In this article, two new public data sets are introduced, the two-class Sudanese Sentiment Data set (SudSenti2) and the three-class Sudanese Sentiment Data set (SudSenti3). In the preparation phase, we establish a Sudanese stopword list. Furthermore, a convolutional neural network (CNN) architecture, Sentiment Convolutional MMA (SCM), is proposed, comprising five CNN layers together with a novel Mean Max Average (MMA) pooling layer, to extract the best features. This SCM model is applied to SudSenti2 and SudSenti3 and shown to be superior to the baseline models, with accuracies of 92.25% and 85.23% (Experiments 1 and 2). The performance of MMA is compared with Max, Avg and Min and shown to be better on SudSenti2, the Saudi Sentiment Data set and the MSA Hotel Arabic Review Data set by 1.00%, 0.83% and 0.74%, respectively (Experiment 3). Next, we conduct an ablation study to determine the contribution to performance of text normalisation and the Sudanese stopword list (Experiment 4). For normalisation, this makes a difference of 0.43% on two-class and 0.45% on three-class. For the custom stoplist, the differences are 0.82% and 0.72%, respectively. Finally, the model is compared with other deep learning classifiers, including transformer-based language models for Arabic, and shown to be comparable for SudSenti2 (Experiment 5).

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Natural Language Processing for Arabic Sentiment Analysis: A Systematic Literature Review;IEEE Transactions on Big Data;2024-10

2. Sentiment Analysis of Arabic Dialects: A Review Study;Communications in Computer and Information Science;2024

3. Exploring the Sustainable Development Path of College Volunteerism with Voluntarism in the Context of Deep Learning;Applied Mathematics and Nonlinear Sciences;2023-12-16

4. Ensemble Stacking Model for Sentiment Analysis of Emirati and Arabic Dialects;Journal of King Saud University - Computer and Information Sciences;2023-09

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3