Affiliation:
1. Bar-Ilan University, Israel
Abstract
We present a novel framework for uncertain data management, called ActivePDB. We are given a relational probabilistic database, where each tuple is correct with some probability; e.g., a database constructed from textual data using information extraction. We are now given a query and we want to determine the correctness of its results. Unlike probabilistic databases, we have an oracle that can resolve the uncertainty, such as a domain expert that can verify data against their sources. Since verification may be costly, our goal is to determine the correct output of the query, while asking the oracle to verify as few tuples as possible. ActivePDB provides an end-to-end solution to this problem. In a nutshell, we first track provenance to identify which input tuples contribute to the derivation of each output tuple, and in what ways. We then design an active learning solution to iteratively choose tuples to be verified based on the provenance structure and on an evolving estimation of the probability of the tuples correctness. We will demonstrate ActivePDB in the context of the NELL database of extracted facts, allowing participants to both pose queries and play the role of oracles.
Publisher
Association for Computing Machinery (ACM)
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Hierarchical Crowdsourcing for Data Labeling with Heterogeneous Crowd;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04