Exploring XML web collections with DescribeX-Reference-Cited by-同舟云学术

Exploring XML web collections with DescribeX

Published:2010-07 Issue:3 Volume:4 Page:1-46
ISSN:1559-1131
Container-title:ACM Transactions on the Web
language:en
Short-container-title:ACM Trans. Web

Author:

Consens Mariano P.¹,Miller Renée J.¹,Rizzolo Flavio²,Vaisman Alejandro A.³

Affiliation:

1. University of Toronto, Toronto, Canada

2. University of Ottawa and Carleton University

3. Universidad de Buenos Aires, Buenos Aires, Argentina

Abstract

As Web applications mature and evolve, the nature of the semistructured data that drives these applications also changes. An important trend is the need for increased flexibility in the structure of Web documents. Hence, applications cannot rely solely on schemas to provide the complex knowledge needed to visualize, use, query and manage documents. Even when XML Web documents are valid with regard to a schema, the actual structure of such documents may exhibit significant variations across collections for several reasons: the schema may be very lax (e.g., RSS feeds), the schema may be large and different subsets of it may be used in different documents (e.g., industry standards like UBL), or open content models may allow arbitrary schemas to be mixed (e.g., RSS extensions like those used for podcasting). For these reasons, many applications that incorporate XPath queries to process a large Web document collection require an understanding of the actual structure present in the collection, and not just the schema.To support modern Web applications, we introduce DescribeX, a powerful framework that is capable of describing complex XML summaries of Web collections. DescribeX supports the construction of heterogenous summaries that can be declaratively defined and refined by means of axis path regular expression (AxPREs). AxPREs provide the flexibility necessary for declaratively defining complex mappings between instance nodes (in the documents) and summary nodes. These mappings are capable of expressing order and cardinality, among other properties, which can significantly help in the understanding of the structure of large collections of XML documents and enhance the performance of Web applications over these collections. DescribeX captures most summary proposals in the literature by providing (for the first time) a common declarative definition for them. Experimental results demonstrate the scalability of DescribeX summary operations (summary creation, as well as refinement and stabilization, two key enablers for tailoring summaries) on multi-gigabyte Web collections.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications

Link

https://dl.acm.org/doi/pdf/10.1145/1806916.1806920

Reference73 articles.

1. DescribeX: Interacting with AxPRE Summaries

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. RDF graph summarization for first-sight structure discovery;The VLDB Journal;2020-04-30

2. Parallel quotient summarization of RDF graphs;Proceedings of the International Workshop on Semantic Big Data - SBD '19;2019

3. Summarizing semantic graphs: a survey;The VLDB Journal;2018-12-03