Affiliation:
1. University of Edinburgh
2. Harbin Institute of Technology
3. Beihang University
Abstract
We present CerFix, a data cleaning system that finds
certain fixes
for tuples at the point of data entry,
i.e.
, fixes that are guaranteed correct. It is based on master data, editing rules and certain regions. Given some attributes of an input tuple that are validated (assured correct), editing rules tell us what other attributes to fix and how to correct them with master data. A certain region is a set of attributes that, if validated, warrant a certain fix for the entire tuple. We demonstrate the following facilities provided by CerFix: (1) a region finder to identify certain regions; (2) a data monitor to find certain fixes for input tuples, by guiding users to validate a minimal number of attributes; and (3) an auditing module to show what attributes are fixed and where the correct values come from.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Splitting Tuples of Mismatched Entities;Proceedings of the ACM on Management of Data;2023-12-08
2. Cleanix;ACM SIGMOD Record;2016-05-09
3. Query-Oriented Data Cleaning with Oracles;Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data;2015-05-27
4. Cleanix;Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management;2014-11-03
5. The data analytics group at the qatar computing research institute;ACM SIGMOD Record;2013-01-17