Affiliation:
1. Pontificia Universidad Católica de Chile
Abstract
In this paper, we propose a simple and expressive framework for adding metadata to CSV documents and their noisy variants. The framework is based on annotating parts of the document that can be later used to read, query, or exchange the data. The core of our framework is a language based on extended regular expressions that are used for selecting data. These expressions are then combined using a set of rules in order to annotate the data. We study the computational complexity of implementing our framework and present an efficient evaluation algorithm that runs in time proportional to its output and linear in its input. As a proof of concept, we test an implementation of our framework against a large number of real world datasets and show that it can be efficiently used in practice.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Pollock: A Data Loading Benchmark;Proceedings of the VLDB Endowment;2023-04
2. Face Recognition Based Automated Attendance Management System;International Journal of Scientific Research in Science and Technology;2022-02-08
3. Ensuring Data Readiness for Quality Requirements with Help from Procedure Reuse;Journal of Data and Information Quality;2021-04-27
4. Pytheas;Proceedings of the VLDB Endowment;2020-08
5. Establishing the Syntactic Rules of the Kankana-ey Dialect using TensorFlow;IOP Conference Series: Materials Science and Engineering;2020-04-01