Improved fast partitional clustering algorithm for text clustering-Reference-Cited by-同舟云学术

Improved fast partitional clustering algorithm for text clustering

Published:2020-08-31 Issue:2 Volume:39 Page:2137-2145
ISSN:1064-1246
Container-title:Journal of Intelligent & Fuzzy Systems
language:
Short-container-title:IFS

Author:

Bejos Sebastián¹²,Feliciano-Avelino Ivan¹,Martínez-Trinidad J. Fco.¹,Carrasco-Ochoa J. A.¹

Affiliation:

1. Instituto Nacional de Astrofísica, Óptica y Electrónica, Puebla, Mexico

2. División de Matemáticas e Ingeniería, FES Acatlán, Universidad Nacional Autónoma de México, Naucalpan, Mexico

Abstract

Document clustering has become an important task for processing the big amount of textual information available on the Internet. On the other hand, k-means is the most widely used algorithm for clustering, mainly due to its simplicity and effectiveness. However, k-means becomes slow for large and high dimensional datasets, such as document collections. Recently the FPAC algorithm was proposed to mitigate this problem, but the improvement in the speed was reached at the cost of reducing the quality of the clustering results. For this reason, in this paper, we introduce an improved FPAC algorithm, which, according our experiments on different document collections, allows obtaining better clustering results than FPAC, without highly increasing the runtime.

Publisher

IOS Press

Subject

Artificial Intelligence,General Engineering,Statistics and Probability

Reference8 articles.

1. A k-partitioning algorithm for clustering large-scale spatio-textual data;Choi;Inf Syst,2017

2. A survey of clustering algorithms for big data: Taxonomy and empirical analysis;Fahad;IEEE Transactions on Emerging Topics in Computing,2014

3. A review of clustering techniques and developments;Saxena;Neurocomputing,2017

4. A fast partitional clustering algorithm based on nearest neighbours heuristics;Ganguly;Pattern Recognition Letters,2018

5. A k-partitioning algorithm for clustering large-scale spatio-textual data;Choi;Information Systems,2017

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An incremental clustering algorithm based on semantic concepts;Knowledge and Information Systems;2024-02-15

2. An Exploration of the Connotation Characteristics of Data Civics and Its Effects in the Digital Era;Applied Mathematics and Nonlinear Sciences;2023-12-13

3. A new Chinese text clustering algorithm based on WRD and improved K-means;Intelligent Data Analysis;2023-07-20

4. Clustering Methods and Tools to Handle High-Dimensional Social Media Text Data;Advances in Social Networking and Online Communities;2023-06-09

5. Design of an Intelligent Processing System for Business Data Analysis Based on Improved Clustering Algorithm;Procedia Computer Science;2023