Affiliation:
1. Univ. of Technology Sydney, Australia
2. Curtin Univ. of Technology, Perth, Australia
3. Tsinghua Univ., China
Abstract
Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly concerned with mining frequent induced and embedded ordered subtrees. Our main contributions are as follows. We describe our unique
embedding list
representation of the tree structure, which enables efficient implementation of our
Tree Model Guided
(
TMG
) candidate generation.
TMG
is an optimal, nonredundant enumeration strategy that enumerates all the valid candidates that conform to the structural aspects of the data. We show through a mathematical model and experiments that
TMG
has better complexity compared to the commonly used join approach. In this article, we propose two algorithms, MB3-Miner and iMB3-Miner. MB3-Miner mines embedded subtrees. iMB3-Miner mines induced and/or embedded subtrees by using the
maximum level of embedding constraint
. Our experiments with both synthetic and real datasets against two well-known algorithms for mining induced and embedded subtrees, demonstrate the effectiveness and the efficiency of the proposed techniques.
Publisher
Association for Computing Machinery (ACM)
Reference37 articles.
1. Mining association rules between sets of items in large databases
2. Agrawal R. Mannila H. Srikant R. Toivonen H. and Verkamo A. I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining Usama M. Fayyad Gregory Piatetsky-Shapiro Padhraic Smyth Ramasamy Uthurusamy Eds. American Association for Artificial Intelligence CA 307--328. Agrawal R. Mannila H. Srikant R. Toivonen H. and Verkamo A. I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining Usama M. Fayyad Gregory Piatetsky-Shapiro Padhraic Smyth Ramasamy Uthurusamy Eds. American Association for Artificial Intelligence CA 307--328.
3. Efficiently mining long patterns from databases
Cited by
21 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献