A Novel Approach of Clustering Documents: Minimizing Computational Complexities in Accessing Database Systems
-
Published:2022
Issue:4
Volume:19
Page:
-
ISSN:2309-4524
-
Container-title:The International Arab Journal of Information Technology
-
language:en
-
Short-container-title:IAJIT
Author:
Alghobiri Mohammed,Mohiuddin Khalid,Abdul Khaleel Mohammed,Islam Mohammad,Shahwar Samreen,Nasr Osman
Abstract
This study addresses the real-time issue of managing an academic program's documents in a university environment. In practice, document classification from a corpus is challenging when the dataset size is large, and the complexity increases if to meet some specific document management requirements. This study presents a practical approach to grouping documents based on a content similarity measure. The approach analyzes the state-of-the-art clustering algorithms performance, considers Hamiltonian graph properties and a distance function. The distance function measures (1) the content similarity between the documents and (2) the distances between the produced clusters. The proposed algorithm improves clusters’ quality by applying Hamiltonian graph properties. One of the significant characteristics of the proposed function is that it determines document types from the corpus. Hence, this does not require the initial assumption of cluster number before the algorithm execution. This approach omits the arbitrary primordial option of k-centroids of the k-means algorithm, reduces computational complexities, and overcomes some limitations of commonly practicing clustering algorithms. The proposed approach enables an effective way of document organization opportunities to the information systems developers when designing document management systems.
Publisher
Zarqa University
Subject
General Computer Science
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献