Abstract
The ultimate aim of Machine Learning (ML) is to make machine acts like a human. In particular, ML algorithms are widely used to classify texts. Text classification is the process of classifying texts into a predefined set of categories based on the texts’ content. It contributes to improving information retrieval on the Web. In this paper, we focus on the "Arabic" text classification since there is a large community in the world that uses this language. The Arabic text classification process consists of three main steps: preprocessing, feature extraction and ML algorithm. This paper presents a comparative empirical study to see which combination (feature extraction - ML algorithm) acts well when dealing with Arabic documents. So, we implemented one hundred sixty classifiers by combining 5 feature extraction techniques and 32 machine learning algorithms. Then, we made these classifiers open access for the benefit of the AI and NLP communities. Experiments were carried out using a huge open dataset. The comparison study reveals that TFIDF-Perceptron is the best performing combination of a classifier.
Reference76 articles.
1. A. L. Samuel, "Some studies in machine learning using the game of checkers," IBM Journal of Research and Development, vol. 3, pp. 210-229, 1959.
2. UNESCO. (2020). World Arabic Language Day, December 18, 2020. Available: https://en.unesco.org/commemorations/worldarabiclanguageday. Last visited: June 2022.
3. M. Biniz, "DataSet for Arabic Classification," Mendeley Data, V2, doi: 10.17632/v524p5dhpj.2, 2018.
4. M. A. H. Madhfar and M. A. H. Al-Hagery, "Arabic text classification: A comparative approach using a big dataset," in 2019 International Conference on Computer and Information Sciences (ICCIS), 2019, pp. 1-5.
5. E. Hanandeh, "Arabic text categorization using three classifiers methods: A comparative study," International Journal of Computer Science Issues (IJCSI), vol. 15, pp. 49-52, 2018.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Hybrid Approach for Multi-Classification of News Documents Using Artificial Intelligence;2024 5th International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV);2024-03-11