Affiliation:
1. University of Wisconsin-Madison, Madison, WI, USA
2. University of Mons, Mons, Belgium
Abstract
Most data analytical pipelines often encounter the problem of querying inconsistent data that violate pre-determined integrity constraints. Data cleaning is an extensively studied paradigm that singles out a consistent repair of the inconsistent data. Consistent query answering (CQA) is an alternative approach to data cleaning that asks for all tuples guaranteed to be returned by a given query on all (in most cases, exponentially many) repairs of the inconsistent data. In this paper, we identify a class of acyclic select-project-join (SPJ) queries for which CQA can be solved via SQL rewriting with a linear time guarantee. Our rewriting method can be viewed as a generalization of Yannakakis' algorithm for acyclic joins to the inconsistent setting. We present LinCQA, a system that takes as input any query in our class and outputs rewritings in both SQL and non-recursive Datalog with negation. We show that LinCQA often outperforms the existing CQA systems on both synthetic and real-world workloads, and in some cases, by orders of magnitude.
Funder
National Science Foundation
Publisher
Association for Computing Machinery (ACM)
Reference70 articles.
1. Datalog vs first-order logic
2. Fast and Simple Relational Processing of Uncertain Data
3. A grammar-based entity representation framework for data cleaning
4. Marcelo Arenas , Leopoldo E. Bertossi , and Jan Chomicki . 1999. Consistent Query Answers in Inconsistent Databases . In PODS. ACM Press , 68--79. Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. 1999. Consistent Query Answers in Inconsistent Databases. In PODS. ACM Press, 68--79.
5. Answer sets for consistent query answering in inconsistent databases
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献