Author:
Al-Fatlawi Ahmed Abdul Hassan,Mohammed Ghassan N.,Al Barazanchi Israa
Abstract
Hash functions are an integral part of MapReduce software, both in Apache Hadoop and Spark. If the hash function performs badly, the load in the reduced part will not be balanced and access times will spike. To investigate this problem further, we ran the Wordcount program with numerous different hash functions on Amazon AWS. In particular, we will leverage the Amazon Elastic MapReduce framework. The paper investigates the general purpose, cryptographic, checksum, and special hash functions. Through the analysis, we present the corresponding runtime results.
Publisher
Southwest Jiaotong University
Reference36 articles.
1. HADOOP, A. (2018) Hadoop. [Online] Available from: http://hadoop.apache.org [Accessed 17/09/19].
2. SPARK, A. (2016) Apache spark: Lightning-fast cluster computing. [Online] Available from: https://sur.ly/o/spark.apache.org/AA000014 [Accessed 17/09/19].
3. BIANCHINI, M., GORI, M., and SCARSELLI, F. (2005) Inside pagerank. ACM Transactions on Internet Technology, 5 (1), pp. 92-128.
4. HE, B., FANG, W., LUO, Q., GOVINDARAJU, N.K., and WANG, T. (2008) Mars: a MapReduce framework on graphics processors. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, Toronto, October 2008. New York: Association for Computing Machinery, pp. 260-269.
5. KATSOULIS, S. (2011) Implementation of Parallel Hash Join Algorithms over Hadoop. Edinburgh: University of Edinburgh.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献