Performance Challenges and Solutions in Big Data Platform Hadoop-Reference-Cited by-同舟云学术

Performance Challenges and Solutions in Big Data Platform Hadoop

Published:2023-11 Issue:9 Volume:16 Page:
ISSN:2666-2558
Container-title:Recent Advances in Computer Science and Communications
language:en
Short-container-title:RACSC

Author:

Singh Balraj¹²,Verma Harsh K¹,Madaan Vishu²

Affiliation:

1. Department of Computer Science and Engineering, Dr. B.R. Ambedkar NIT, Jalandhar, India

2. School of Computer Science and Engineering, Lovely Professional University, Phagwara, India

Abstract

Background: The present era demands continuous support to bring improvements in executing complex analytics on large-scale data and to work beyond traditional systems. Objective: The need for processing diverse data types and solutions for different domains of the industry is rising. Such needs increase the requirement for sophisticated techniques and methods to enhance the existing platforms and mechanisms further. It provides an opportunity for the research community to investigate further into the existing systems, find potential issues, and propose new ways to improve the current systems. Hadoop is a popular choice to manage and process Big data. It is an open-source platform and a front-runner in the batch processing of large-scale jobs. The economy associated with the cluster in scaling is low as compared to other platforms. However, this popularity by no means guarantees high performance in all scenarios. With the continuous evolution in data development and industrial requirements, it is imperative to investigate and look into new methods and techniques to bring advancements to the existing system. Method: A systematic review is represented in this paper to have an insight into the current progress in this field. Research publications from various sources are taken and analyzed. The performance of a cluster largely depends upon the different job processing mechanisms and policies associated with it. Conclusion: While extensive studies and solutions are proposed, the performance bottlenecks in terms of load balancing, resource utilization, content management, and efficient processing prevail. Not many of the solutions are there on scheduling about the trade-off between different parameters, the process of content splitting and merging is not explored to a large extent and the skew mitigation solutions are more focused on Reduce side of the MapReduce while the Map side is not utilized much for load balancing.

Publisher

Bentham Science Publishers Ltd.

Subject

General Computer Science

Reference102 articles.

1. Kitchenham B.; Pearl Brereton O.; Budgen D.; Turner M.; Bailey J.; Linkman S.; Systematic literature reviews in software engineering – A systematic literature review. Inf Softw Technol 2009,51(1),7-15

2. Chen Q.; Zhang D.; Guo M.; Deng Q.; Guo S.; 2010 10th IEEE International Conference on Computer and Information Technology, Bradford, UK 2010,2736-2743

3. Cheng D.; Rao J.; Guo Y.; Jiang C.; Zhou X.; Improving the performance of heterogeneous mapreduce clusters with adaptive task tuning. IEEE Trans Parallel Distrib Syst 2017,28(3),774-786

4. Kwon Y.; Balazinska M.; Howe B.; Rolia J.; In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD '12) Association for Computing Machinery, New York, NY, USA ,25-36

5. Kwon Y.; Balazinska M.; Howe B.; Rolia J.; A study of skew in mapreduce applications Open Cirrus Summit 2011,11(8)

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Big Data, Bigger Challenges: A Comparative Study of Performance Testing;2023 Seventh International Conference on Image Information Processing (ICIIP);2023-11-22