Affiliation:
1. Federal University of Technology, Campo Mourão, Paraná, Brazil
2. LNCC-DEXL, Petropolis, Rio de Janeiro, Brazil
3. Hasso Plattner Institute, University of Potsdam, Germany
Abstract
Denial constraints (DCs) are an integrity constraint formalism widely used to detect inconsistencies in data. Several algorithms have been devised to discover DCs from data, as manually specifying them is burdensome and, worse yet, error-prone. The existing algorithms follow two basic steps: building an intermediate data structure from records, then enumerating the DCs from that intermediate. However, current algorithms are often inefficient in computing these intermediates. Also, it is still unclear which enumeration algorithm performs best since some of the available algorithms have not yet been compared to each other.
In response, we present a set of new algorithms with improved design choices. We introduce a parallel pipeline for rapidly computing the intermediate using custom data representations, algorithms, and indexes. For DC enumeration, we propose an inverted index, pruning, and parallel search strategies. We present hybrid approaches that integrate our techniques with previous enumeration algorithms, improving their performance in many scenarios. Our experimental study shows that the proposed DC discovery algorithms are consistently much faster (up to an order of magnitude) than the current state-of-the-art.
Publisher
Association for Computing Machinery (ACM)
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献