Applying a Character-Level Model to a Short Arabic Dialect Sentence: A Saudi Dialect as a Case Study-Reference-Cited by-同舟云学术

Applying a Character-Level Model to a Short Arabic Dialect Sentence: A Saudi Dialect as a Case Study

Published:2022-12-05 Issue:23 Volume:12 Page:12435
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Alqurashi Tahani^ORCID

Abstract

Arabic dialect identification (ADI) has recently drawn considerable interest among researchers in language recognition and natural language processing fields. This study investigated the use of a character-level model that is effectively unrestricted in its vocabulary, to identify fine-grained Arabic language dialects in the form of short written text. The Saudi dialects, particularly the four main Saudi dialects across the country, were considered in this study. The proposed ADI approach consists of five main phases, namely dialect data collection, data preprocessing and labelling, character-based feature extraction, deep learning character-based model/classical machine learning character-based models, and model evaluation performance. Several classical machine learning methods, including logistic regression, stochastic gradient descent, variations of the naive Bayes models, and support vector classification, were applied to the dataset. For the deep learning, the character convolutional neural network (CNN) model was adapted with a bidirectional long short-term memory approach. The collected data were tested under various classification tasks, including two-, three- and four-way ADI tasks. The results revealed that classical machine learning algorithms outperformed the CNN approach. Moreover, the use of the term frequency–inverse document frequency, combined with a character n-grams model ranging from unigrams to four-grams achieved the best performance among the tested parameters.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/23/12435/pdf

Reference35 articles.

1. United Nations Educational, Scientific and Cultural Organization (UNESCO) (2020, June 13). World Arabic Language Day. Available online: https://en.unesco.org/commemorations/worldarabiclanguageday.

2. General Authority for Statistics, Kingdom of Saudi Arabia (2020, June 13). Saudi Census, Available online: https://www.stats.gov.sa/en.

3. Arabic natural language processing: An overview;Guellil;J. King Saud Univ.-Comput. Inf. Sci.,2019

4. Automatic language identification in texts: A survey;Jauhiainen;J. Artif. Intell. Res.,2019

5. Malmasi, S., Refaee, E., and Dras, M. (2015, January 19–21). Arabic dialect identification using a parallel multidialectal corpus. Proceedings of the Conference of the Pacific Association for Computational Linguistics, Bali, Indonesia.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Literature Review: NLP Techniques for Arabic Dialect Recognition;2024 International Conference on Circuit, Systems and Communication (ICCSC);2024-06-28

2. Enhancing Arabic Dialect Detection on Social Media: A Hybrid Model with an Attention Mechanism;Information;2024-05-28

3. Features and Methods;Synthesis Lectures on Human Language Technologies;2024

4. Special Issue “Recent Trends in Natural Language Processing and Its Applications”;Applied Sciences;2023-06-19