Affiliation:
1. Department of Computer Science, Faculty of Science of Rabat, Mohammed V University, 4 Avenue Ibn Battouta B. P. 1014 RP, Rabat, Morocco
Abstract
Recently, MapReduce-based implementations of clustering algorithms have been developed to cope with the Big Data phenomenon, and they show promising results particularly for the document clustering problem. In this paper, we extend an efficient data partitioning method based on the relational analysis (RA) approach and applied to the document clustering problem, called PDC-Transitive. The introduced heuristic is parallelised using the MapReduce model iteratively and designed with a single reducer which represents a bottleneck when processing large data, we improved the design of the PDC-Transitive method to avoid the data dependencies and reduce the computation cost. Experiment results on benchmark datasets demonstrate that the enhanced heuristic yields better quality results and requires less computing time compared to the original method.
Publisher
World Scientific Pub Co Pte Lt
Subject
Library and Information Sciences,Computer Networks and Communications,Computer Science Applications