Affiliation:
1. Hankuk University of Foreign Studies, Seoul, Korea
Abstract
This article presents a modified genetic algorithm for text document clustering on the cloud. Traditional approaches of genetic algorithms in document clustering represents chromosomes based on cluster centroids, and does not divide cluster centroids during crossover operations. This limits the possibility of the algorithm to introduce different variations to the population, leading it to be trapped in local minima. In this approach, a crossover point may be selected even at a position inside a cluster centroid, which allows modifying some cluster centroids. This also guides the algorithm to get rid of the local minima, and find better solutions than the traditional approaches. Moreover, instead of running only one genetic algorithm as done in the traditional approaches, this article partitions the population and runs a genetic algorithm on each of them. This gives an opportunity to simultaneously run different parts of the algorithm on different virtual machines in cloud environments. Experimental results also demonstrate that the accuracy of the proposed approach is at least 4% higher than the other approaches.
Reference24 articles.
1. Applications of Population Based Algorithms for Document Clustering.;J.Agrawal;CSI Communications,2012
2. An Evolutionary Approach for Document Clustering
3. An improvement of the standard genetic algorithm fighting premature convergence in continuous optimization
4. Nonparametric genetic clustering: Comparison of validity indices. IEEE Trans. System Man Cybern.-;S.Bandyopadhyay;Part C Applications and Reviews,2001