Decreasing the execution time of reducers by revising clustering based on the futuristic greedy approach-Reference-Cited by-同舟云学术

Decreasing the execution time of reducers by revising clustering based on the futuristic greedy approach

Published:2020-01-09 Issue:1 Volume:7 Page:
ISSN:2196-1115
Container-title:Journal of Big Data
language:en
Short-container-title:J Big Data

Author:

Bakhthemmat Ali,Izadi Mohammad

Abstract

AbstractMapReduce is used within the Hadoop framework, which handles two important tasks: mapping and reducing. Data clustering in mappers and reducers can decrease the execution time, as similar data can be assigned to the same reducer with one key. Our proposed method decreases the overall execution time by clustering and lowering the number of reducers. Our proposed algorithm is composed of five phases. In the first phase, data are stored in the Hadoop structure. In the second phase, we cluster data using the MR-DBSCAN-KD method in order to determine all of the outliers and clusters. Then, the outliers are assigned to the existing clusters using the futuristic greedy method. At the end of the second phase, similar clusters are merged together. In the third phase, clusters are assigned to the reducers. Note that fewer reducers are required for this task by applying approximated load balancing between the reducers. In the fourth phase, the reducers execute their jobs in each cluster. Eventually, in the final phase, reducers return the output. Decreasing the number of reducers and revising the clustering helped reducers to perform their jobs almost simultaneously. Our research results indicate that the proposed algorithm improves the execution time by about 3.9% less than the fastest algorithm in our experiments.

Publisher

Springer Science and Business Media LLC

Subject

Information Systems and Management,Computer Networks and Communications,Hardware and Architecture,Information Systems

Link

http://link.springer.com/content/pdf/10.1186/s40537-019-0279-z.pdf

Reference31 articles.

1. Tsai C-W, Lai C-F, Chao H-C, Vasilakos AV. Big data analytics: a survey. J Big data. 2015;2(1):21.

2. Sanse K, Sharma M. Clustering methods for Big data analysis. Int J Adv Res Comput Eng Technol. 2015;4(3):642–8.

3. Zhao W, Ma H, He Q. Parallel k-means clustering based on mapreduce. In: IEEE international conference on cloud computing. 2009. p. 674–9.

4. Srivastava DK, Yadav R, Agrwal G. Map reduce programming model for parallel K-mediod algorithm on hadoop cluster. In: 2017 7th international conference on communication systems and network technologies (CSNT). 2017. p. 74–8.

5. Dai B-R, Lin I-C. Efficient map/reduce-based dbscan algorithm with optimized data partition. In: 2012 IEEE Fifth international conference on cloud computing. 2012. p. 59–66.

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MapReduce algorithms for robust center-based clustering in doubling metrics;Journal of Parallel and Distributed Computing;2024-12

2. Scalable and space-efficient Robust Matroid Center algorithms;Journal of Big Data;2023-04-17

3. Distributed k-Means with Outliers in General Metrics;Euro-Par 2023: Parallel Processing;2023

4. NDPD: an improved initial centroid method of partitional clustering for big data mining;Journal of Advances in Management Research;2022-08-23

5. Min‐max kurtosis stratum mean: An improved K‐means cluster initialization approach for microarray gene clustering on multidimensional big data;Concurrency and Computation: Practice and Experience;2022-07-16