Affiliation:
1. University of California, Santa Cruz, CA
2. Universitat Pompeu Fabra, Barcelona, Spain
3. University of California, Santa Cruz and IBM Research--Almaden
Abstract
A schema mapping is a high-level specification of the relationship between a source schema and a target schema. Recently, a line of research has emerged that aims at deriving schema mappings automatically or semi-automatically with the help of data examples, that is, pairs consisting of a source instance and a target instance that depict, in some precise sense, the intended behavior of the schema mapping. Several different uses of data examples for deriving, refining, or illustrating a schema mapping have already been proposed and studied.
In this article, we use the lens of computational learning theory to systematically investigate the problem of obtaining algorithmically a schema mapping from data examples. Our aim is to leverage the rich body of work on learning theory in order to develop a framework for exploring the power and the limitations of the various algorithmic methods for obtaining schema mappings from data examples. We focus on GAV schema mappings, that is, schema mappings specified by GAV (Global-As-View) constraints. GAV constraints are the most basic and the most widely supported language for specifying schema mappings. We present an efficient algorithm for learning GAV schema mappings using Angluin's model of exact learning with membership and equivalence queries. This is optimal, since we show that neither membership queries nor equivalence queries suffice, unless the source schema consists of unary relations only. We also obtain results concerning the learnability of schema mappings in the context of Valiant's well-known PAC (Probably-Approximately-Correct) learning model, and concerning the learnability of restricted classes of GAV schema mappings. Finally, as a byproduct of our work, we show that there is no efficient algorithm for approximating the shortest GAV schema mapping fitting a given set of examples, unless the source schema consists of unary relations only.
Funder
Ministerio de Ciencia e Innovación
Division of Information and Intelligent Systems
Publisher
Association for Computing Machinery (ACM)
Cited by
20 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. On the non-efficient PAC learnability of conjunctive queries;Information Processing Letters;2024-01
2. Extremal Fitting Problems for Conjunctive Queries;Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems;2023-06-18
3. Conjunctive Queries: Unique Characterizations and Exact Learnability;ACM Transactions on Database Systems;2022-11-06
4. On the Parameterized Complexity of Learning First-Order Logic;Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems;2022-06-12
5. Scalable and Usable Relational Learning With Automatic Language Bias;Proceedings of the 2021 International Conference on Management of Data;2021-06-09