Abstract
AbstractTask stragglers in MapReduce jobs dramatically impede job execution of data-intensive computing in cloud data centers. This impedance is due to the uneven distribution of input data, heterogeneous data nodes, resource contention situations, and network configurations. Data skew of intermediate data in MapReduce job causes delay failures due to the violation of job completion time. Data-intensive computing frameworks, such as MapReduce or Hadoop YARN, employ HashPartitioner. This partitioner may cause intermediate data skew, which results in straggler reducers. In this paper, we strive to make Hadoop YARN more efficient in cloud environments. We present, a new partitioning scheme, called balanced data clusters partitioner (BDCP), to handle straggler Reduce tasks based on sampling of input data and feedback information about the current processing task. Our extensive experimental results show that BDCP can outperform the default Hadoop HashPartitioner and Range partitioner. BDCP can assist in straggler mitigation during reduce phase and minimize the job completion time in MapReduce jobs within data-intensive cloud computing.
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Software
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献