Affiliation:
1. University of A Coruña, Spain
2. University of Chile, Chile
Abstract
The
eXtensible Markup Language
(XML) is acknowledged as the
de facto
standard for semistructured data representation and data exchange on the Web and many other scenarios. A well-known shortcoming of XML is its verbosity, which increases manipulation, transmission, and processing costs. Various structure-blind and structure-conscious compression techniques can be applied to XML, and some are even access-friendly, meaning that the documents can be efficiently accessed in compressed form. Direct access is necessary to implement the query languages XPath and XQuery, which are the standard ones to exploit the expressiveness of XML. While a good deal of theoretical and practical proposals exist to solve XPath/XQuery operations on XML, only a few ones are well integrated with a compression format that supports the required access operations on the XML data. In this work we go one step further and design a compression format for XML collections that
boosts
the performance of XPath queries on the data. This is done by designing compressed representations of the XML data that support some complex operations apart from just accessing the data, and those are exploited to solve key components of the XPath queries. Our system, called XXS, is aimed at XML collections containing natural language text, which are compressed to within 35%--50% of their original size while supporting a large subset of XPath operations in time competitive with, and many times outperforming, the best state-of-the-art systems that work on uncompressed representations.
Funder
MINECO
Fondecyt
MICINN
ITC-20133062 (for the Spanish group)
Xunta de Galicia
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Reference61 articles.
1. Using structural contexts to compress semistructured text collections
2. XQueC
3. R. A. Baeza-Yates and B. Ribeiro-Neto. 1999. Modern Information Retrieval. Addison-Wesley Longman. R. A. Baeza-Yates and B. Ribeiro-Neto. 1999. Modern Information Retrieval. Addison-Wesley Longman.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Managing Compressed Structured Text;Encyclopedia of Database Systems;2018
2. Managing Compressed Structured Text;Encyclopedia of Database Systems;2017