Dialectal Detective: Leveraging Feature Selection Techniques to Unearth Moroccan Dialect Arabic

Author:

ABDELLAH AITELOULI1,Ouahi Hassan2,Cherrat El Mehdi1,BEKKAR Abdellatif

Affiliation:

1. Ibn Zohr University

2. Ibn ZohrUniversity Ait Melloul

Abstract

Abstract In the evolving context of automated language processing, unraveling the complex fabric of Moroccan Arabic (Darija) in a multilingual environment is a major challenge. This study embarks on the arduous task of detecting the nuances of Moroccan Arabic within a binary dataset that consists of both Standard Arabic and Darija expressions. Using a comprehensive methodology, we interweave a sophisticated set of feature selection techniques, including well-known extraction techniques in natural language processing (NLP) such as TF-idf, CBOW, and Word2Vec. By leveraging the capabilities of machine learning techniques via LASSO decision and regression trees, we navigate the labyrinth of linguistic diversity, relying on semantic methods that consist of using mostly advanced encoders to deepen our understanding of the distinctive linguistic fabric. We also look at static methods of feature selection, such as ANOVA, Pearson correlation coefficient, and mutual information, in order to add strata of analysis. Finally, by emphasizing the paramount importance of dimensionality reduction through principal component analysis (PCA) and singular value decomposition (SVD), our methodology not only preserves the essential structures in the high-dimensional linguistic space, but also strives to contribute significantly to the accurate detection of Moroccan Arabic dialects. The stability of our results is achieved with the XGBOOST algorithm using classical extraction methods and SVD, with a reasonable execution time. This research aspires not only to unveil the specific subtleties of Arabic dialects, but also to open up new horizons in the field of natural language processing within diverse and multilingual societies.

Publisher

Research Square Platform LLC

Reference31 articles.

1. Naaima Boudad, Rdouan Faizi, Rachid Oulad Haj Thami, Raddouane Chiheb ,”Sentiment analysis in Arabic : A review of the literature “,ENSIAS, Mohammed V University, Rabat, Morocco the literature. Ain Shams Eng. J. (2017, in press). https://doi.org/10.1016/j.asej.2017.04.007

2. "Feature Selection Techniques for Improved Multilingual Text Classification;Smith J;Journal of Natural Language Processing,2018

3. Garcia, A., et al. (2019). "Cross-Language Word Embeddings for Improved Semantic Similarity." Proceedings of the International Conference on Natural Language Processing, 76–88.

4. "Enhancing Sentiment Analysis Through Recursive Feature Elimination;Chen X;Journal of Computational Linguistics,2020

5. "Dimensionality Reduction in NLP: A Comparative Analysis.";Rodriguez M;ACM Transactions on Natural Language Processing,2017

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3