Affiliation:
1. Department of Computer Information Systems, Jordan University of Science and Technology, Irbid, Jordan
2. Department of Computer Science, Jordan University of Science and Technology, Irbid, Jordan
Abstract
Arabic language is a challenging language for automatic processing. This is due to several intrinsic reasons such as Arabic multi-dialects, ambiguous syntax, syntactical flexibility and diacritics. Machine learning and deep learning frameworks require big datasets for training to ensure accurate predictions. This leads to another challenge faced by researches using Arabic text; as Arabic textual datasets of high quality are still scarce. In this paper, an intelligent framework for expanding or augmenting Arabic sentences is presented. The sentences were initially labelled by human annotators for sentiment analysis. The novel approach presented in this work relies on the rich morphology of Arabic, synonymy lists, syntactical or grammatical rules, and negation rules to generate new sentences from the seed sentences with their proper labels. Most augmentation techniques target image or video data. This study is the first work to target text augmentation for Arabic language. Using this framework, we were able to increase the size of the initial seed datasets by 10 folds. Experiments that assess the impact of this augmentation on sentiment analysis showed a 42% average increase in accuracy, due to the reliability and the high quality of the rules used to build this framework.
Funder
Jordan University of Science and Technology, Jordan
Reference71 articles.
1. Speech recognition challenge in the wild: Arabic MGB-3;Ahmed,2017
2. Automatic extraction of ontological relations from Arabic text;Al Zamil;Journal of King Saud University—Computer and Information Sciences,2014
3. A comprehensive survey of Arabic sentiment analysis;Al-Ayyoub;Information Processing & Management,2019
4. Survey on Arabic sentiment analysis in Twitter;Al-Humoud;International Science Index,2015
5. Arabic language: historic and sociolinguistic characteristics;Al-Huri;English Literature and Language Review,2015
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献