Affiliation:
1. Department of Computer Science, AlZaytoonah University, Amman, Jordan
2. Department of Computer Information System, AlZaytoonah University, Amman, Jordan
3. Department of Power and Mechatronics Engineering, Tafila Technical University, Tafilah, Jordan
4. Department of Computer Information System, AlZaytoonah University, Amman,Jordan
Abstract
Introduction:
Stemming is an important preprocessing step in text classification, and could contribute in
increasing text classification accuracy. Although many works proposed stemmers for English language, few stemmers
were proposed for Arabic text. Arabic language has gained increasing attention in the previous decades and the need is
vital to further improve Arabic text classification.
Method:
This work combined the use of the recently proposed P-Stemmer with various classifiers to find the optimal
classifier for the P-stemmer in term of Arabic text classification. As part of this work, a synthesized dataset was collected.
Result:
The previous experiments show that the use of P-Stemmer has a positive effect on classification. The degree of
improvement was classifier-dependent, which is reasonable as classifiers vary in their methodologies. Moreover, the
experiments show that the best classifier with the P-Stemmer was NB. This is an interesting result as this classifier is wellknown for its fast learning and classification time.
Discussion:
First, the continuous improvement of the P-Stemmer by more optimization steps is necessary to further
improve the Arabic text categorization. This can be made by combining more classifiers with the stemmer, by optimizing
the other natural language processing steps, and by improving the set of stemming rules. Second, the lack of sufficient
Arabic datasets, especially large ones, is still an issue.
Conclusion:
In this work, an improved P-Stemmer was proposed by combining its use with various classifiers. In order to
evaluate its performance, and due to the lack of Arabic datasets, a novel Arabic dataset was synthesized from various
online news pages. Next, the P-Stemmer was combined with Naïve Bayes, Random Forest, Support Vector Machines, KNearest Neighbor, and K-Star.
Publisher
Bentham Science Publishers Ltd.
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献