On the performance of SQL scalable systems on Kubernetes: a comparative study-Reference-Cited by-同舟云学术

On the performance of SQL scalable systems on Kubernetes: a comparative study

Published:2022-09-09 Issue: Volume: Page:
ISSN:1386-7857
Container-title:Cluster Computing
language:en
Short-container-title:Cluster Comput

Author:

Cardas Cristian,Aldana-Martín José F.,Burgueño-Romero Antonio M.,Nebro Antonio J.,Mateos Jose M.,Sánchez Juan J.

Abstract

AbstractThe popularization of Hadoop as the the-facto standard platform for data analytics in the context of Big Data applications has led to the upsurge of SQL-on-Hadoop systems, which provide scalable query execution engines allowing the use of SQL queries on data stored in HDFS. In this context, Kubernetes appears as the leading choice to simplify the deployment and scaling of containerized applications; however, there is a lack of studies about the performance of SQL-on-Hadoop systems deployed on Kubernetes, and this is the gap we intend to fill in this paper. We present an experimental study involving four representative SQL scalable platforms: Apache Drill, Apache Hive, Apache Spark SQL and Trino. Concretely, we analyze the performance of these systems when they are deployed on a Hadoop cluster with Kubernetes by using the TPC-H benchmark. The results of our study can help practitioners and users about what they can expect in terms of performance if they plan to use the advantages of Kubernetes to deploy applications using the analyzed SQL scalable platforms.

Funder

Universidad de Málaga

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Software

Link

https://link.springer.com/content/pdf/10.1007/s10586-022-03718-9.pdf

Reference19 articles.

1. White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Sebastopol (2009)

2. Capriolo, E., Wampler, D., Rutherglen, J.: Programming Hive: Data Warehouse and Query Language for Hadoop. O’Reilly Media, Sebastopol (2012)

3. Russell, J.: Getting Started with Impala: Interactive SQL for Apache Hadoop. O’Reilly Media, Sebastopol (2014)

4. Fuller, M., Traverso, M., Moser, M.: Trino: The Definitive Guide. O’Reilly Media, Sebastopol, (2021)

5. Givre, C., Rogers, P.: Learning Apache Drill: Query and Analyze Distributed Data Sources with SQL. O’Reilly Media, Sebastopol (2018)

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. PAC: A monitoring framework for performance analysis of compression algorithms in Spark;Future Generation Computer Systems;2024-08

2. Vertically Autoscaling Monolithic Applications with CaaSPER: Scalable C ontainer- a s- a - S ervice P erformance E nhanced R esizing Algorithm for the Cloud;Companion of the 2024 International Conference on Management of Data;2024-06-09

3. Privacy-preserving Data Federation for Trainable, Queryable and Actionable Data;2023 IEEE 39th International Conference on Data Engineering Workshops (ICDEW);2023-04