Abstract
AbstractHere, we present a novel algorithm for frequent itemset mining in streaming data (FIM-SD). For the past decade, various FIM-SD methods in one-pass approximation settings that allow to approximate the support of each itemset have been proposed. They can be categorized into two approximation types: parameter-constrained (PC) mining and resource-constrained (RC) mining. PC methods control the maximum error that can be included in the approximate support based on a pre-defined parameter. In contrast, RC methods limit the maximum memory consumption based on resource constraints. However, the existing PC methods can exponentially increase the memory consumption, while the existing RC methods can rapidly increase the maximum error. In this study, we address this problem by introducing a hybrid approach of PC-RC approximations, called PARASOL. For any streaming data, PARASOL ensures to provide a condensed representation, called a Δ-covered set, which is regarded as an extension of the closedness compression; when Δ = 0, the solution corresponds to the ordinary closed itemsets. PARASOL searches for such approximate closed itemsets that can restore the frequent itemsets and their supports while the maximum error is bounded by an integer, Δ. Then, we empirically demonstrate that the proposed algorithm significantly outperforms the state-of-the-art PC and RC methods for FIM-SD.
Funder
Japan Society for the Promotion of Science
Publisher
Springer Science and Business Media LLC
Subject
Artificial Intelligence,Computer Networks and Communications,Hardware and Architecture,Information Systems,Software
Reference33 articles.
1. Borgelt, C., Yang, X., Cadenas, R.N., Saez, P.C., Montano, A.P. (2011). Finding closed item sets by intersecting transactions. In Proc. of the 14th int. conf. on extending database technology (EDBT) (pp. 367–376).
2. Boley, M., Horváth, T., Wrobel, S. (2009). Efficient discovery of interesting patterns based on strong closedness. In Proc. of SIAM int. conf. on data mining (SDM) (pp. 1002–1013).
3. Boley, M., Gärtner, T., Grosskreux, H. (2010). Formal concept sampling for counting and threshold-free local pattern mining. In Proc. of SIAM int. conf. on data mining (SDM) (pp. 177–188).
4. Chi, Y., Wang, H., Yu, P.S., Muntz, R.R. (2004). Moment: maintaining closed frequent itemsets over a stream sliding window. In Proc. of the 4th IEEE int. conf. on data mining (ICDM) (pp. 59–66).
5. Chang, Y.-K. (2005). Simple and fast IP lookups using binomial spanning tree. Journal of Computer Communications, 28, 529–539.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Interpretable Analysis of Production GPU Clusters Monitoring Data via Association Rule Mining;2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS);2024-05-27
2. Online Closed Episode Mining with Root-Order Decomposition;2023 IEEE International Conference on Big Data (BigData);2023-12-15
3. Mining frequent Itemsets from transaction databases using hybrid switching framework;Multimedia Tools and Applications;2023-02-16
4. CICLAD: A Fast and Memory-efficient Closed Itemset Miner for Streams;Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining;2020-07-06