Affiliation:
1. Department of Computer Science and Engineering, Dr. B.R. Ambedkar NIT, Jalandhar, India
2. School of Computer
Science and Engineering, Lovely Professional University, Phagwara, India
Abstract
Background:
The present era demands continuous support to bring improvements in
executing complex analytics on large-scale data and to work beyond traditional systems.
Objective:
The need for processing diverse data types and solutions for different domains of the
industry is rising. Such needs increase the requirement for sophisticated techniques and methods to
enhance the existing platforms and mechanisms further. It provides an opportunity for the research
community to investigate further into the existing systems, find potential issues, and propose new
ways to improve the current systems. Hadoop is a popular choice to manage and process Big data.
It is an open-source platform and a front-runner in the batch processing of large-scale jobs. The
economy associated with the cluster in scaling is low as compared to other platforms. However, this
popularity by no means guarantees high performance in all scenarios. With the continuous evolution
in data development and industrial requirements, it is imperative to investigate and look into
new methods and techniques to bring advancements to the existing system.
Method:
A systematic review is represented in this paper to have an insight into the current progress
in this field. Research publications from various sources are taken and analyzed. The performance
of a cluster largely depends upon the different job processing mechanisms and policies associated
with it.
Conclusion:
While extensive studies and solutions are proposed, the performance bottlenecks in
terms of load balancing, resource utilization, content management, and efficient processing prevail.
Not many of the solutions are there on scheduling about the trade-off between different parameters,
the process of content splitting and merging is not explored to a large extent and the skew mitigation
solutions are more focused on Reduce side of the MapReduce while the Map side is not utilized
much for load balancing.
Publisher
Bentham Science Publishers Ltd.