Affiliation:
1. School of Computing Science, Simon Fraser University, Burnaby, Canada
2. Qatar Computing Research Institute, HBKU, Qatar
Abstract
One of the main challenges that data-cleaning systems face is to
automatically
identify and repair data errors in a
dependable
manner. Though data dependencies (also known as integrity constraints) have been widely studied to capture errors in data, automated and dependable data repairing on these errors has remained a notoriously difficult problem. In this work, we introduce an automated approach for dependably repairing data errors, based on a novel class of
fixing rules
. A fixing rule contains an evidence pattern, a set of negative patterns, and a fact value. The heart of fixing rules is
deterministic
: given a tuple, the evidence pattern and the negative patterns of a fixing rule are combined to precisely capture which attribute is wrong, and the fact indicates how to correct this error. We study several fundamental problems associated with fixing rules and establish their complexity. We develop efficient algorithms to check whether a set of fixing rules are consistent and discuss approaches to resolve inconsistent fixing rules. We also devise efficient algorithms for repairing data errors using fixing rules. Moreover, we discuss approaches on how to generate a large number of fixing rules from examples or available knowledge bases. We experimentally demonstrate that our techniques outperform other automated algorithms in terms of the accuracy of repairing data errors, using both real-life and synthetic data.
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems and Management,Information Systems
Reference44 articles.
1. Learning string transformations from examples
2. Marcelo Arenas Leopoldo E. Bertossi and Jan Chomicki. 1999. Consistent query answers in inconsistent databases. In PODS. 68--79. 10.1145/303976.303983 Marcelo Arenas Leopoldo E. Bertossi and Jan Chomicki. 1999. Consistent query answers in inconsistent databases. In PODS. 68--79. 10.1145/303976.303983
3. Data cleaning and query answering with matching dependencies and matching functions
Cited by
13 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献