Abstract
The fast-growing Internet results in massive amounts of text data. Due to the large volume of the unstructured format of text data, extracting relevant information and its analysis becomes very challenging. Text document clustering is a text-mining process that partitions the set of text-based documents into mutually exclusive clusters in such a way that documents within the same group are similar to each other, while documents from different clusters differ based on the content. One of the biggest challenges in text clustering is partitioning the collection of text data by measuring the relevance of the content in the documents. Addressing this issue, in this work a hybrid swarm intelligence algorithm with a K-means algorithm is proposed for text clustering. First, the hybrid fruit-fly optimization algorithm is tested on ten unconstrained CEC2019 benchmark functions. Next, the proposed method is evaluated on six standard benchmark text datasets. The experimental evaluation on the unconstrained functions, as well as on text-based documents, indicated that the proposed approach is robust and superior to other state-of-the-art methods.
Funder
Romanian Ministry of Education and Research
Ministarstvo Prosvete, Nauke i Tehnološkog Razvoja
Subject
General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)
Reference74 articles.
1. A new Fruit Fly Optimization Algorithm: Taking the financial distress model as an example
2. Firefly Algorithms for Multimodal Optimization;Yang,2009
3. Some Methods for Classification and Analysis of MultiVariate Observations;MacQueen,1967
4. A Comprehensive Survey of Clustering Algorithms
Cited by
90 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献