Improving Arabic Text Classification Using P-Stemmer

Author:

Kanan Tarek1ORCID,Hawashin Bilal2ORCID,Alzubi Shadi1ORCID,Almaita Eyad3ORCID,Alkhatib Ahmad2ORCID,Maria Khulood Abu4ORCID,Elbes Mohammed1ORCID

Affiliation:

1. Department of Computer Science, AlZaytoonah University, Amman, Jordan

2. Department of Computer Information System, AlZaytoonah University, Amman, Jordan

3. Department of Power and Mechatronics Engineering, Tafila Technical University, Tafilah, Jordan

4. Department of Computer Information System, AlZaytoonah University, Amman,Jordan

Abstract

Introduction: Stemming is an important preprocessing step in text classification, and could contribute in increasing text classification accuracy. Although many works proposed stemmers for English language, few stemmers were proposed for Arabic text. Arabic language has gained increasing attention in the previous decades and the need is vital to further improve Arabic text classification. Method: This work combined the use of the recently proposed P-Stemmer with various classifiers to find the optimal classifier for the P-stemmer in term of Arabic text classification. As part of this work, a synthesized dataset was collected. Result: The previous experiments show that the use of P-Stemmer has a positive effect on classification. The degree of improvement was classifier-dependent, which is reasonable as classifiers vary in their methodologies. Moreover, the experiments show that the best classifier with the P-Stemmer was NB. This is an interesting result as this classifier is wellknown for its fast learning and classification time. Discussion: First, the continuous improvement of the P-Stemmer by more optimization steps is necessary to further improve the Arabic text categorization. This can be made by combining more classifiers with the stemmer, by optimizing the other natural language processing steps, and by improving the set of stemming rules. Second, the lack of sufficient Arabic datasets, especially large ones, is still an issue. Conclusion: In this work, an improved P-Stemmer was proposed by combining its use with various classifiers. In order to evaluate its performance, and due to the lack of Arabic datasets, a novel Arabic dataset was synthesized from various online news pages. Next, the P-Stemmer was combined with Naïve Bayes, Random Forest, Support Vector Machines, KNearest Neighbor, and K-Star.

Publisher

Bentham Science Publishers Ltd.

Subject

General Computer Science

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Arabic text classification based on analogical proportions;Expert Systems;2024-06-17

2. A Proposed Technique for Business Process Modeling Diagram Using Natural Language Processing;2023 International Conference on Information Technology (ICIT);2023-08-09

3. Developing off-chain system interfaces in health and pharmaceutical blockchain applications;PROCEEDINGS OF THE 4TH INTERNATIONAL COMPUTER SCIENCES AND INFORMATICS CONFERENCE (ICSIC 2022);2023

4. A Review Study on Arabic Text Classification;2022 International Arab Conference on Information Technology (ACIT);2022-11-22

5. EHHR: an efficient evolutionary hyper-heuristic based recommender framework for short-text classifier selection;Cluster Computing;2022-10-10

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3