Affiliation:
1. University of Edinburgh and Lucent Technologies
2. Yahoo! Research
Abstract
A fundamental concern of data integration in an XML context is the ability to
embed
one or more source documents in a target document so that (a) the target document conforms to a target schema and (b) the information in the source documents is
preserved
. In this paper, information preservation for XML is formally studied, and the results of this study guide the definition of a novel notion of
schema embedding
between two XML DTD schemas represented as graphs. Schema embedding generalizes the conventional notion of graph similarity by allowing an edge in a source DTD schema to be mapped to a path in the target DTD. Instance-level embeddings can be derived from the schema embedding in a straightforward manner, such that conformance to a target schema and information preservation are guaranteed. We show that it is NP-complete to find an embedding between two DTD schemas. We also outline efficient heuristic algorithms to find candidate embeddings, which have proved effective by our experimental study. These yield the first systematic and effective approach to finding information preserving XML mappings.
Funder
Biotechnology and Biological Sciences Research Council
National Natural Science Foundation of China
ERSRC
Publisher
Association for Computing Machinery (ACM)
Reference50 articles.
1. Complexity of answering queries using materialized views
2. Restructuring hierarchical database objects
3. Abiteboul S. Hull R. and Vianu V. 1995. Foundations of Databases. Addison-Wesley. Abiteboul S. Hull R. and Vianu V. 1995. Foundations of Databases. Addison-Wesley.
4. XML with data values: typechecking revisited
Cited by
28 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献