An Experimental Study for the Effect of Stop Words Elimination for Arabic Text Classification Algorithms-Reference-Cited by-同舟云学术

An Experimental Study for the Effect of Stop Words Elimination for Arabic Text Classification Algorithms

Published:2011-04 Issue:2 Volume:6 Page:68-75
ISSN:1554-1045
Container-title:International Journal of Information Technology and Web Engineering
language:en
Short-container-title:

Author:

Al-Shargabi Bassam¹,Olayah Fekry¹,Romimah Waseem AL²

Affiliation:

1. Isra University, Jordan

2. University of Science and Technology, Yemen

Abstract

In this paper, an experimental study was conducted on three techniques for Arabic text classification. These techniques are Support Vector Machine (SVM) with Sequential Minimal Optimization (SMO), Naïve Bayesian (NB), and J48. The paper assesses the accuracy for each classifier and determines which classifier is more accurate for Arabic text classification based on stop words elimination. The accuracy for each classifier is measured by Percentage split method (holdout), and K-fold cross validation methods, along with the time needed to classify Arabic text. The results show that the SMO classifier achieves the highest accuracy and the lowest error rate, and shows that the time needed to build the SMO model is much lower compared to other classification techniques.

Publisher

IGI Global

Subject

General Computer Science

Reference14 articles.

1. Effect of stop words removing for Arabic information retrieval. International Journal of Computing &;A.Abo Alkhair;Information Science,2006

2. Al-Harbi, S., Almuhareb, A., Al-Thubaity, A., Khorsheed, M. S., & Al-Rajeh, A. (2008). Automatic Arabic text classification. In Proceedings of the 9th International Conference on the Statistical Analysis of Textual Data, Lyon, France.

3. Al-Shalabi, R., Kanaan, G., Jaam, J. M., Hasnah, A., & Hilat, E. (2004). Stop-word removal algorithm for Arabic language. In Proceedings of 1st International Conference on Information & Communication Technologies: From Theory to Applications, Damascus, Syria (pp. 545-550).

4. El-Kourdi, M., Bensaid, A., & Rachidi, T. (2004). Automatic Arabic document categorization based on the naive Bayes algorithm. In Proceedings of the Workshop on Computational Approaches to Arabic Script Based Languages, Geneva, Switzerland (pp. 51-58).

5. El-Kourdi, M., Bensaid, A., & Rachidi, T. (2004, August). Automatic Arabic document categorization based on the Naïve Bayes algorithm. In Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland.

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Study on Corpus-based Stopword Lists in Indian Language IR;ACM Transactions on Asian and Low-Resource Language Information Processing;2023-07-25

2. News image text classification algorithm with bidirectional encoder representations from transformers model;Journal of Electronic Imaging;2022-09-13

3. An effective approach for Arabic document classification using machine learning;Global Transitions Proceedings;2022-06

4. Effect of stopwords in Indian language IR;Sādhanā;2022-01-10

5. Effects of Light Stemming on Feature Extraction and Selection for Arabic Documents Classification;Studies in Computational Intelligence;2019-11-30