Abstract
Effective organization and retrieval of news content are heavily reliant on accurate news classification. While the mountainous research has been conducted in resourceful languages like English and Chinese, the researches on under-resourced languages like the Kurdish language are severely lacking. To address this challenge, we introduce a hybrid approach called RFO-CNN in this paper. The proposed method combines an improved version of red fox optimization algorithm (RFO) and convolutional neural network (CNN) for finetuning CNN’s parameters. Our model’s efficacy was tested on two widely used Kurdish news datasets, KNDH and KDC-4007, both of which contain news articles classified into various categories. We compared the performance of RFO-CNN to other cutting-edge deep learning models such as bidirectional long short-term memory networks and bidirectional encoder representations from transformers (BERT) transformers, as well as classical machine learning approaches such as multinomial naive bayes, support vector machine, and K-nearest neighbors. We trained and tested our datasets using four different scenarios: 60:40, 70:30, 80:20, and 90:10. Our experimental results demonstrate the superiority of the RFO-CNN model across all scenarios, outperforming the benchmark BERT model and other machine learning models in terms of accuracy and F1-score.
Reference30 articles.
1. Ahmadi, S., 2020. KLPT-Kurdish Language Processing Toolkit. In Proceedings of the Second Workshop for NLP Open Source Software (NLP-OSS), pp.72-84.
2. Al-Tahrawi, M.M., 2015. Arabic text categorization using logistic regression. International Journal of Intelligent Systems and Applications, 7(6), pp.71-78.
3. Azad, R., Mohammed, B., Mahmud, R., Zrar, L., and Sdiqa, S.J., 2021. Fake news detection in low resourced languages ”Kurdish language” using machine learning algorithms. Journal of Computational Science Education, 12(6), pp.4219-4225.
4. Badawi, S., 2023. Data augmentation for Sorani Kurdish news headline classification using back-translation and deep learning model. Kurdistan Journal of Applied Research, 8(1), pp.27-34.
5. Badawi, S., 2024. Deep learning-based cyberbullying detection in Kurdish language. The Computer Journal, p.bxae024.