Characterizing schema mappings via data examples

Author:

Alexe Bogdan1,Cate Balder TEN2,Kolaitis Phokion G.3,Tan Wang-Chiew4

Affiliation:

1. IBM Research - Almaden, San Jose, CA

2. University of California, Santa Cruz, Santa Cruz, CA

3. University of California, Santa Cruz CA and IBM Research - Almaden, San Jose, CA

4. IBM Research - Almaden and University of California, Santa Cruz, San Jose, CA and Santa Cruz, CA

Abstract

Schema mappings are high-level specifications that describe the relationship between two database schemas; they are considered to be the essential building blocks in data exchange and data integration, and have been the object of extensive research investigations. Since in real-life applications schema mappings can be quite complex, it is important to develop methods and tools for understanding, explaining, and refining schema mappings. A promising approach to this effect is to use “good” data examples that illustrate the schema mapping at hand. We develop a foundation for the systematic investigation of data examples and obtain a number of results on both the capabilities and the limitations of data examples in explaining and understanding schema mappings. We focus on schema mappings specified by source-to-target tuple generating dependencies (s-t tgds) and investigate the following problem: which classes of s-t tgds can be “uniquely characterized” by a finite set of data examples? Our investigation begins by considering finite sets of positive and negative examples, which are arguably the most natural choice of data examples. However, we show that they are not powerful enough to yield interesting unique characterizations. We then consider finite sets of universal examples, where a universal example is a pair consisting of a source instance and a universal solution for that source instance. We first show that unique characterizations via universal examples is, in a precise sense, equivalent to the existence of Armstrong bases (a relaxation of the classical notion of Armstrong databases). After this, we show that every schema mapping specified by LAV s-t tgds is uniquely characterized by a finite set of universal examples with respect to the class of LAV s-t tgds. Moreover, this positive result extends to the much broader classes of n -modular schema mappings, n a positive integer. Finally, we study the unique characterizability of GAV schema mappings. It turns out that some GAV schema mappings are uniquely characterizable by a finite set of universal examples with respect to the class of GAV s-t tgds, while others are not. By unveiling a tight connection with homomorphism dualities, we establish an effective, sound, and complete criterion for determining whether or not a GAV schema mapping is uniquely characterizable by a finite set of universal examples with respect to the class of GAV s-t tgds.

Funder

Division of Information and Intelligent Systems

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems

Reference37 articles.

1. Abiteboul S. Hull R. and Vianu V. 1995. Foundations of Databases. Addison-Wesley. Abiteboul S. Hull R. and Vianu V. 1995. Foundations of Databases. Addison-Wesley.

2. Designing and refining schema mappings via data examples

3. Muse: Mapping Understanding and deSign by Example

4. Characterizing schema mappings via data examples

5. STBenchmark

Cited by 31 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Characterising Modal Formulas with Examples;ACM Transactions on Computational Logic;2024-04-16

2. Exploring Data Preparation Modules by Examples;Lecture Notes in Computer Science;2024

3. Extremal Fitting Problems for Conjunctive Queries;Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems;2023-06-18

4. GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example;Proceedings of the ACM on Management of Data;2023-06-13

5. Fitting Algorithms for Conjunctive Queries;ACM SIGMOD Record;2023-01-19

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3