The Hybrid Feature Selection k-means Method for Arabic Webpage Classification-Reference-Cited by-同舟云学术

The Hybrid Feature Selection k-means Method for Arabic Webpage Classification

Published:2014-09-18 Issue:5 Volume:70 Page:
ISSN:2180-3722
Container-title:Jurnal Teknologi
language:
Short-container-title:Jurnal Teknologi

Author:

Alghamdi Hanan,Selamat Ali

Abstract

The high-dimensional data features found in the enormous amount of Arabic text available on the Internet is an important research problem in Web information retrieval. It reduces the accuracy of the clustering algorithms and maximizes the processing time. Selecting the relevant features is the best solution. Therefore, in this paper, we propose a feature selection model that incorporates three different feature selection methods (CHI-squared, mutual information, and term frequency-inverse document frequency) to build a hybrid feature selection model (Hybrid-FS) for k-means clustering. This model represents text data in a high structure (consisting of three types of objects, namely, the terms, documents and categories). We evaluate the model on a set of common Arabic online newspapers. We assess the effect of using the Hybrid-FS with standard k-means clustering. The experimental results show that the proposed method increases purity by 28% and lowers the runtime by 80% compared to the standard k-means algorithm. We conclude that the proposed hybrid feature selection model enhances the accuracy of the k-means algorithm and successfully produces coherent-compact clusters that are well-separated when applied to high-dimensional datasets.

Publisher

Penerbit UTM Press

Subject

General Engineering

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. New Model of Feature Selection based Chaotic Firefly Algorithm for Arabic Text Categorization;The International Arab Journal of Information Technology;2023

2. An Improved Chaotic Sine Cosine Firefly Algorithm for Arabic Feature Selection;Proceedings of the 6th International Conference on Big Data and Internet of Things;2023

3. A New Metaheuristic Approach Based Feature Selection for Arabic Text Categorization;2022 International Arab Conference on Information Technology (ACIT);2022-11-22

4. Designing a hybrid dimension reduction for improving the performance of Amharic news document classification;PLOS ONE;2021-05-21

5. Hybrid Feature Selection for Amharic News Document Classification;Mathematical Problems in Engineering;2021-03-11