Author:
Skrzypczyński Mikołaj,Muryjas Piotr
Abstract
The aim of this paper is the analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment. The analysis was based on comparison between both mentioned tools with use of large data set, represented by 28 million records. Research was provided with use of scripts and queries destined for Apache Hive and Apache Pig, and then executed 10 times on environment brought by created virtual machine. Those methods were performed on the same data sets for 16 times according to previously prepared research scenarios. As the conclusion, authors had observed that Apache Hive is more efficient tool, than Apache Pig.
Reference11 articles.
1. K. Bansal, P. Chawla, P. Kurle, Analyzing Performance of Apache Pig and Apache Hive with Hadoop, International Conference On Engineering Vibration Communication and Information Processing (ICoEVCI), (2018) 41-51, https://doi.org/10.1007/978-981-13-1642-5_4
2. M. Ahmad, S. Kanwal, M. Cheema, M. A. Habib, Performance Analysis of ECG Big Data using Apache Hive and Apache Pig, 2019 8th International Conference on Information and Communication Technologies (ICICT), (2019) 2-7, https://doi.org/10.1109/ICICT47744.2019.9001287
3. A. Fuad, A. Erwin, H. P. Ipung, Processing performance on Apache Pig, Apache Hive and MySQL cluster, Proceedings of International Conference on Information, Communication Technology and System (ICTS), (2014) 297-302, https://doi.org/10.1109/ICTS.2014.7010600
4. Dokumentacja techniczna technologii Apache Hadoop https://hadoop.apache.org/, [10.07.2023]
5. K. Sitto, M. Presser, Field Guide to Hadoop: An Introduction to Hadoop, Its Ecosystem, and Aligned Technologies, O'Reilly Media, 2015