Automated Translation of Functional Big Data Queries to SQL-Reference-Cited by-同舟云学术

Automated Translation of Functional Big Data Queries to SQL

Published:2023-04-06 Issue:OOPSLA1 Volume:7 Page:580-608
ISSN:2475-1421
Container-title:Proceedings of the ACM on Programming Languages
language:en
Short-container-title:Proc. ACM Program. Lang.

Author:

Zhang Guoqiang¹^ORCID,Mariano Benjamin²^ORCID,Shen Xipeng¹^ORCID,Dillig Işıl²^ORCID

Affiliation:

1. North Carolina State University, USA

2. University of Texas at Austin, USA

Abstract

Big data analytics frameworks like Apache Spark and Flink enable users to implement queries over large, distributed databases using functional APIs. In recent years, these APIs have grown in popularity because their functional interfaces abstract away much of the minutiae of distributed programming required by traditional query languages like SQL. However, the convenience of these APIs comes at a cost because functional queries are often less efficient than their SQL counterparts. Motivated by this observation, we present a new technique for automatically transpiling functional queries to SQL. While our approach is based on the standard paradigm of counterexample-guided inductive synthesis, it uses a novel column-wise decomposition technique to split the synthesis task into smaller subquery synthesis problems. We have implemented this approach as a new tool called RDD2SQL for translating Spark RDD queries to SQL and empirically evaluate the effectiveness of RDD2SQL on a set of real-world RDD queries. Our results show that (1) most RDD queries can be translated to SQL, (2) our tool is very effective at automating this translation, and (3) performing this translation offers significant performance benefits.

Publisher

Association for Computing Machinery (ACM)

Subject

Safety, Risk, Reliability and Quality,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3586047

Reference60 articles.

1. Automatically translating image processing libraries to halide

2. The Stratosphere platform for big data analytics

3. Synthesis Through Unification

4. Synthesis Through Unification

5. Spark SQL

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Relational Expressions for Data Transformation and Computation;Lecture Notes in Computer Science;2023-11-07