Affiliation:
1. University of Salerno, Fisciano (SA), Italy
Abstract
Functional dependencies (
fd
s) are one of the metadata used to assess data quality and to perform data cleaning operations. However, to pursue robustness with respect to data errors, it has been necessary to devise imprecise versions of functional dependencies, yielding relaxed functional dependencies (
rfd
s). Among them, there exists the class of
rfd
s relaxing on the extent, i.e., those admitting the possibility that an
fd
holds on a subset of data. In the literature, several algorithms to automatically discover
rfd
s from big data collections have been defined. They achieve good performances with respect to the inherent problem complexity. However, most of them are capable of discovering
rfd
s only by batch processing the entire dataset. This is not suitable in the era of big data, where the size of a database instance can grow with high-velocity, and the insertion of new data can invalidate previously holding
rfd
s. Thus, it is necessary to devise incremental discovery algorithms capable of updating the set of holding
rfd
s upon data insertions, without processing the entire dataset. To this end, in this article we propose an incremental discovery algorithm for
rfd
s relaxing on the extent. It manages the validation of candidate
rfd
s and the generation of possibly new
rfd
candidates upon the insertion of the new tuples, while limiting the size of the overall search space. Experimental results show that the proposed algorithm achieves extremely good performances on real-world datasets.
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems and Management,Information Systems
Reference40 articles.
1. Profiling relational data: a survey
2. DFD
3. K. Bache and M. Lichman. 2017. UCI Machine Learning Repository. University of California School of Information and Computer Science Irvine CA. K. Bache and M. Lichman. 2017. UCI Machine Learning Repository. University of California School of Information and Computer Science Irvine CA.
4. On the Discovery of Relaxed Functional Dependencies
Cited by
13 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献