XML clustering: a review of structural approaches-Reference-Cited by-同舟云学术

XML clustering: a review of structural approaches

Published:2014-10-29 Issue:3 Volume:30 Page:297-323
ISSN:0269-8889
Container-title:The Knowledge Engineering Review
language:en
Short-container-title:The Knowledge Engineering Review

Author:

Piernik Maciej,Brzezinski Dariusz,Morzy Tadeusz,Lesniewska Anna

Abstract

AbstractWith its presence in data integration, chemistry, biological, and geographic systems, eXtensible Markup Language (XML) has become an important standard not only in computer science. A common problem among the mentioned applications involves structural clustering of XML documents—an issue that has been thoroughly studied and led to the creation of a myriad of approaches. In this paper, we present a comprehensive review of structural XML clustering. First, we provide a basic introduction to the problem and highlight the main challenges in this research area. Subsequently, we divide the problem into three subtasks and discuss the most common document representations, structural similarity measures, and clustering algorithms. In addition, we present the most popular evaluation measures, which can be used to estimate clustering quality. Finally, we analyze and compare 23 state-of-the-art approaches and arrange them in an original taxonomy. By providing an up-to-date analysis of existing structural XML clustering algorithms, we hope to showcase methods suitable for current applications and draw lines of future research.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Software

Reference97 articles.

1. OPTICS

2. Zhu Y.-W. , Ji G.-L. & Sun Q.-H. 2010. Clustering GML documents using maximal frequent induced subtrees. In Proceedings of the 7th International Conference on Fuzzy Systems and Knowledge Discover, FSKD’10, 5, 2265–2269.

3. Tag Name Structure-based Clustering of XML Documents

4. Murray-Rust P. & Rzepa H. 1995. Chemical Markup Language, http://www.xml-cml.org/.

5. Guerrini G. , Mesiti M. & Sanz I. 2007. An overview of similarity measures for clustering XML documents. In Web Data Management Practices: Emerging Techniques and Technologies, chapter 3, Vakali, A. & Pallis, G. (eds)., 56--78. Idea Group Inc. (IGI).

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Leveraging Structural and Semantic Measures for JSON Document Clustering;JUCS - Journal of Universal Computer Science;2023-03-28

2. Data clustering: application and trends;Artificial Intelligence Review;2022-11-27

3. JSON document clustering based on schema embeddings;Journal of Information Science;2022-09-12

4. Data-driven assessment of structural evolution of RDF graphs;Semantic Web;2020-08-25

5. TreeXP—An Instantiation of XPattern Framework;Data Science and Security;2020-08-01