Affiliation:
1. University of Washington, USA
Abstract
The ability to reason about relational queries plays an important role across many types of database applications, such as test data generation, query equivalence checking, and computer-assisted query authoring. Unfortunately, symbolic reasoning about relational queries can be challenging because relational tables are multisets (bags) of tuples, and the underlying languages, such as SQL, can introduce complex computation among tuples.
We propose a space refinement algorithm that soundly reduces the space of tables such applications need to consider. The refinement procedure, independent of the specific dataset application, uses the abstract semantics of the query language to exploit the provenance of tuples in the query output to prune the search space. We implemented the refinement algorithm and evaluated it on SQL using three reasoning tasks: bounded query equivalence checking, test generation for applications that manipulate relational data, and concolic testing of database applications. Using real world benchmarks, we show that our refinement algorithm significantly speeds up (up to 100×) the SQL solver when reasoning about a large class of challenging SQL queries, such as those with aggregations.
Funder
Defense Advanced Research Projects Agency
National Science Foundation
U.S. Department of Energy
Publisher
Association for Computing Machinery (ACM)
Subject
Safety, Risk, Reliability and Quality,Software
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity Constraints;Proceedings of the ACM on Programming Languages;2024-04-29
2. Predicate Pushdown for Data Science Pipelines;Proceedings of the ACM on Management of Data;2023-06-13
3. Verifying Data Constraint Equivalence in FinTech Systems;2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE);2023-05
4. Active Learning for Inference and Regeneration of Applications that Access Databases;ACM Transactions on Programming Languages and Systems;2021-02
5. Provenance-guided synthesis of Datalog programs;Proceedings of the ACM on Programming Languages;2020-01