Affiliation:
1. PUC Chile and University of Oxford, UK
2. Hasselt University and Transnational University of Limburg, Diepenbeek, Belgium
3. PUC Chile, Chile
4. Université Libre de Bruxelles (ULB)
Abstract
A great deal of research into the learning of schemas from XML data has been conducted in recent years to enable the automatic discovery of XML schemas from XML documents when no schema or only a low-quality one is available. Unfortunately, and in strong contrast to, for instance, the relational model, the automatic discovery of even the simplest of XML constraints, namely XML keys, has been left largely unexplored in this context. A major obstacle here is the unavailability of a theory on reasoning about XML keys in the presence of XML schemas, which is needed to validate the quality of candidate keys. The present article embarks on a fundamental study of such a theory and classifies the complexity of several crucial properties concerning XML keys in the presence of an XSD, like, for instance, testing for consistency, boundedness, satisfiability, universality, and equivalence. Of independent interest, novel results are obtained related to cardinality estimation of XPath result sets. A mining algorithm is then developed within the framework of levelwise search. The algorithm leverages known discovery algorithms for functional dependencies in the relational model, but incorporates the properties mentioned before to assess and refine the quality of derived keys. An experimental study on an extensive body of real-world XML data evaluating the effectiveness of the proposed algorithm is provided.
Funder
Hercules Foundation and the Flemish Government
Fondecyt
ERC
Publisher
Association for Computing Machinery (ACM)
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. References;Data Lakes;2020-04-15
2. Dependencies for Graphs;ACM Transactions on Database Systems;2019-04-08
3. Probabilistic Keys;IEEE Transactions on Knowledge and Data Engineering;2017-03-01
4. An Incremental Learner for Language-Based Anomaly Detection in XML;2016 IEEE Security and Privacy Workshops (SPW);2016-05
5. FRWSC: a framework for robust Web service composition;Service Oriented Computing and Applications;2016-04-05