Affiliation:
1. Facebook, Menlo Park, CA
2. University of Washington, Seattle, WA
Abstract
We study the complexity of computing a query on a probabilistic database. We consider unions of conjunctive queries, UCQ, which are equivalent to positive, existential First Order Logic sentences, and also to nonrecursive datalog programs. The tuples in the database are independent random events. We prove the following dichotomy theorem. For every UCQ query, either its probability can be computed in polynomial time in the size of the database, or is #P-hard. Our result also has applications to the problem of computing the probability of positive, Boolean expressions, and establishes a dichotomy for such classes based on their structure. For the tractable case, we give a very simple algorithm that alternates between two steps: applying the inclusion/exclusion formula, and removing one existential variable. A key and novel feature of this algorithm is that it avoids computing terms that cancel out in the inclusion/exclusion formula, in other words it only computes those terms whose Mobius function in an appropriate lattice is nonzero. We show that this simple feature is a key ingredient needed to ensure completeness. For the hardness proof, we give a reduction from the counting problem for positive, partitioned 2CNF, which is known to be #P-complete. The hardness proof is nontrivial, and combines techniques from logic, classical algebra, and analysis.
Funder
Division of Information and Intelligent Systems
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Hardware and Architecture,Information Systems,Control and Systems Engineering,Software
Reference37 articles.
1. Abiteboul S. Hull R. and Vianu V. 1995. Foundations of Databases. Addison Wesley Publishing Co. Abiteboul S. Hull R. and Vianu V. 1995. Foundations of Databases. Addison Wesley Publishing Co.
2. Optimal implementation of conjunctive queries in relational data bases
3. Complexity of Generalized Satisfiability Counting Problems
4. Computing query probability with incidence algebras
Cited by
77 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Applications and Computation of the Shapley Value in Databases and Machine Learning;Companion of the 2024 International Conference on Management of Data;2024-06-09
2. The Generalized Causal-Effect Score in Data Management (short paper);Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI;2024-06-09
3. When is Shapley Value Computation a Matter of Counting?;Proceedings of the ACM on Management of Data;2024-05-10
4. Expected Shapley-Like Scores of Boolean functions: Complexity and Applications to Probabilistic Databases;Proceedings of the ACM on Management of Data;2024-05-10
5. From Shapley Value to Model Counting and Back;Proceedings of the ACM on Management of Data;2024-05-10