Affiliation:
1. Instituto Nacional de Astrofísica, Óptica y Electrónica, Puebla, Mexico
2. División de Matemáticas e Ingeniería, FES Acatlán, Universidad Nacional Autónoma de México, Naucalpan, Mexico
Abstract
Document clustering has become an important task for processing the big amount of textual information available on the Internet. On the other hand, k-means is the most widely used algorithm for clustering, mainly due to its simplicity and effectiveness. However, k-means becomes slow for large and high dimensional datasets, such as document collections. Recently the FPAC algorithm was proposed to mitigate this problem, but the improvement in the speed was reached at the cost of reducing the quality of the clustering results. For this reason, in this paper, we introduce an improved FPAC algorithm, which, according our experiments on different document collections, allows obtaining better clustering results than FPAC, without highly increasing the runtime.
Subject
Artificial Intelligence,General Engineering,Statistics and Probability
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献