Affiliation:
1. The University of Chicago
Abstract
As users migrate their analytical workloads to cloud databases, it is becoming just as important to reduce monetary costs as it is to optimize query runtime. In the cloud, a query is billed based on either its compute time or the amount of data it processes. We observe that analytical queries are either compute- or IO-bound and each query type executes cheaper in a different pricing model. We exploit this opportunity and propose methods to build cheaper execution plans across pricing models that complete within user-defined runtime constraints. We implement these methods and produce execution plans spanning multiple pricing models that reduce the monetary cost for workloads by as much as 56%. We reduce individual query costs by as much as 90%. The prices chosen by cloud vendors for cloud services also impact savings opportunities. To study this effect, we simulate our proposed methods with different cloud prices and observe that multi-cloud savings are robust to changes in cloud vendor prices. These results indicate the massive opportunity to save money by executing workloads across multiple pricing models.
Publisher
Association for Computing Machinery (ACM)
Reference78 articles.
1. POLARIS
2. Amazon Athena [n.d.]. Amazon Athena - Serverless Interactive Query Service - Amazon Web Services. Retrieved 2023-11-21 from https://aws.amazon.com/athena/
3. Apache Arrow [n.d.]. Apache Arrow. Retrieved 2024-02-13 from https://arrow.apache.org
4. Apache Parquet [n.d.]. Apache Parquet. Retrieved 2022-12-17 from https://parquet.apache.org/
5. Spark SQL