Path sharing and predicate evaluation for high-performance XML filtering

Author:

Diao Yanlei1,Altinel Mehmet2,Franklin Michael J.1,Zhang Hao1,Fischer Peter3

Affiliation:

1. University of California, Berkeley, Berkeley, California

2. IBM Almaden Research Center, San Jose, California

3. University of Heidelberg, Heidelberg, Germany

Abstract

XML filtering systems aim to provide fast, on-the-fly matching of XML-encoded data to large numbers of query specifications containing constraints on both structure and content. It is now well accepted that approaches using event-based parsing and Finite State Machines (FSMs) can provide the basis for highly scalable structure-oriented XML filtering systems. The XFilter system [Altinel and Franklin 2000] was the first published FSM-based XML filtering approach. XFilter used a separate FSM per path query and a novel indexing mechanism to allow all of the FSMs to be executed simultaneously during the processing of a document. Building on the insights of the XFilter work, we describe a new method, called "YFilter" that combines all of the path queries into a single Nondeterministic Finite Automaton (NFA). YFilter exploits commonality among queries by merging common prefixes of the query paths such that they are processed at most once. The resulting shared processing provides tremendous improvements in structure matching performance but complicates the handling of value-based predicates.In this article, we first describe the XFilter and YFilter approaches and present results of a detailed performance comparison of structure matching for these algorithms as well as a hybrid approach. The results show that the path sharing employed by YFilter can provide order-of-magnitude performance benefits. We then propose two alternative techniques for extending YFilter's shared structure matching with support for value-based predicates, and compare the performance of these two techniques. The results of this latter study demonstrate some key differences between shared XML filtering and traditional database query processing. Finally, we describe how the YFilter approach is extended to handle more complicated queries containing nested path expressions.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems

Reference45 articles.

1. DBIS-toolkit

2. Apache XML Project. 1999. Xerces Java parser 1.2.3 Release. http://xml.apache.org/xerces-j/index.html.]] Apache XML Project. 1999. Xerces Java parser 1.2.3 Release. http://xml.apache.org/xerces-j/index.html.]]

3. Information filtering and information retrieval

Cited by 156 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Nona: A Framework for Elastic Stream Provenance;2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS);2024-07-23

2. Exploiting Structure in Regular Expression Queries;Proceedings of the ACM on Management of Data;2023-06-13

3. A-Tree: A Dynamic Data Structure for Efficiently Indexing Arbitrary Boolean Expressions;Proceedings of the 2021 International Conference on Management of Data;2021-06-09

4. A survey on semi-structured web data manipulations by non-expert users;Computer Science Review;2021-05

5. Hardware/Software Co-design for XML-Document Processing;Advances in Computer Science for Engineering and Education III;2020-08-06

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3