Tree model guided candidate generation for mining frequent subtrees from XML documents

Author:

Tan Henry1,Hadzic Fedja1,Dillon Tharam S.1,Chang Elizabeth2,Feng Ling3

Affiliation:

1. Univ. of Technology Sydney, Australia

2. Curtin Univ. of Technology, Perth, Australia

3. Tsinghua Univ., China

Abstract

Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly concerned with mining frequent induced and embedded ordered subtrees. Our main contributions are as follows. We describe our unique embedding list representation of the tree structure, which enables efficient implementation of our Tree Model Guided ( TMG ) candidate generation. TMG is an optimal, nonredundant enumeration strategy that enumerates all the valid candidates that conform to the structural aspects of the data. We show through a mathematical model and experiments that TMG has better complexity compared to the commonly used join approach. In this article, we propose two algorithms, MB3-Miner and iMB3-Miner. MB3-Miner mines embedded subtrees. iMB3-Miner mines induced and/or embedded subtrees by using the maximum level of embedding constraint . Our experiments with both synthetic and real datasets against two well-known algorithms for mining induced and embedded subtrees, demonstrate the effectiveness and the efficiency of the proposed techniques.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference37 articles.

1. Mining association rules between sets of items in large databases

2. Agrawal R. Mannila H. Srikant R. Toivonen H. and Verkamo A. I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining Usama M. Fayyad Gregory Piatetsky-Shapiro Padhraic Smyth Ramasamy Uthurusamy Eds. American Association for Artificial Intelligence CA 307--328. Agrawal R. Mannila H. Srikant R. Toivonen H. and Verkamo A. I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining Usama M. Fayyad Gregory Piatetsky-Shapiro Padhraic Smyth Ramasamy Uthurusamy Eds. American Association for Artificial Intelligence CA 307--328.

3. Efficiently mining long patterns from databases

Cited by 21 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. From Homomorphisms to Embeddings: A Novel Approach for Mining Embedded Patterns from Large Tree Data;Big Data Research;2018-12

2. Frequent subtree mining on the automata processor;Proceedings of the International Conference on Supercomputing - ICS '17;2017

3. Homomorphic Pattern Mining from a Single Large Data Tree;Data Science and Engineering;2016-12

4. Mining rooted ordered trees under subtree homeomorphism;Data Mining and Knowledge Discovery;2015-10-19

5. Leveraging Homomorphisms and Bitmaps to Enable the Mining of Embedded Patterns from Large Data Trees;Database Systems for Advanced Applications;2015

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3