A performance study of three disk-based structures for indexing and querying frequent itemsets
-
Published:2013-05
Issue:7
Volume:6
Page:505-516
-
ISSN:2150-8097
-
Container-title:Proceedings of the VLDB Endowment
-
language:en
-
Short-container-title:Proc. VLDB Endow.
Author:
Liu Guimei1,
Suchitra Andre1,
Wong Limsoon1
Affiliation:
1. School of Computing, National University of Singapore
Abstract
Frequent itemset mining is an important problem in the data mining area. Extensive efforts have been devoted to developing efficient algorithms for mining frequent itemsets. However, not much attention is paid on managing the large collection of frequent itemsets produced by these algorithms for subsequent analysis and for user exploration. In this paper, we study three structures for indexing and querying frequent itemsets: inverted files, signature files and CFP-tree. The first two structures have been widely used for indexing general set-valued data. We make some modifications to make them more suitable for indexing frequent itemsets. The CFP-tree structure is specially designed for storing frequent itemsets. We add a pruning technique based on length-2 frequent itemsets to make it more efficient for processing superset queries. We study the performance of the three structures in supporting five types of containment queries: exact match, subset/superset search and immediate subset/superset search. Our results show that no structure can outperform other structures for all the five types of queries on all the datasets. CFP-tree shows better overall performance than the other two structures.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Supporting Exploratory Hypothesis Testing and Analysis;ACM Transactions on Knowledge Discovery from Data;2015-06