MAPSkew: Metaheuristic Approaches for Partitioning Skew in MapReduce-Reference-Cited by-同舟云学术

MAPSkew: Metaheuristic Approaches for Partitioning Skew in MapReduce

Published:2018-12-24 Issue:1 Volume:12 Page:5
ISSN:1999-4893
Container-title:Algorithms
language:en
Short-container-title:Algorithms

Author:

Pericini Matheus,Leite Lucas,de Carvalho-Junior Francisco,Machado Javam,Rezende Cenez

Abstract

MapReduce is a parallel computing model in which a large dataset is split into smaller parts and executed on multiple machines. Due to its simplicity, MapReduce has been widely used in various applications domains. MapReduce can significantly reduce the processing time of a large amount of data by dividing the dataset into smaller parts and processing them in parallel in multiple machines. However, when data are not uniformly distributed, we have the so called partitioning skew, where the allocation of tasks to machines becomes unbalanced, either by the distribution function splitting the dataset unevenly or because a part of the data is more complex and requires greater computational effort. To solve this problem, we propose an approach based on metaheuristics. For evaluating purposes, three metaheuristics were implemented: Simulated Annealing, Local Beam Search and Stochastic Beam Search. Our experimental evaluation, using a MapReduce implementation of the Bron-Kerbosch Clique Algorithm, shows that the proposed method can find good partitionings while better balancing data among machines.

Publisher

MDPI AG

Subject

Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science

Link

http://www.mdpi.com/1999-4893/12/1/5/pdf

Reference22 articles.

1. MapReduce

2. OPTIMA: On-Line Partitioning Skew Mitigation for MapReduce with Resource Adjustment

3. Algorithm 457: finding all cliques of an undirected graph

4. Handling partitioning skew in MapReduce using LEEN

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. HTD: heterogeneous throughput-driven task scheduling algorithm in MapReduce;Distributed and Parallel Databases;2021-10-28

2. PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining;Multimedia Systems;2021-03-13