SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming-Reference-Cited by-同舟云学术

SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming

Published:2020-02-04 Issue:10 Volume:76 Page:7619-7634
ISSN:0920-8542
Container-title:The Journal of Supercomputing
language:en
Short-container-title:J Supercomput

Author:

Xiao Wen^ORCID,Hu Juan

Abstract

AbstractFinding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing.

Funder

2019 key project of natural science research in universities in anhui province

Publisher

Springer Science and Business Media LLC

Subject

Hardware and Architecture,Information Systems,Theoretical Computer Science,Software

Link

http://link.springer.com/content/pdf/10.1007/s11227-020-03190-5.pdf

Reference25 articles.

1. Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp 207–216

2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, vol 1215, pp 487–499

3. Park JS, Chen MS, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE Trans Knowl Data Eng 9(5):813–825

4. Ozel SA, Guvenir HA (2001) An algorithm for mining association rules using perfect hashing and database pruning. In: 10th Turkish Symposium on Artificial Intelligence and Neural Networks. Springer, Berlin, pp 257–264

5. Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pp 255–264

Cited by 20 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A review on Ensemble learning based maximal frequent pattern mining over Cloud;2024 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS);2024-02-24

2. Personalized Sports Health Recommendation System Assisted by Q-Learning Algorithm;International Journal of Human–Computer Interaction;2024-01-08

3. Probabilistic Support Prediction: Fast Frequent Itemset Mining in Dense Data;IEEE Access;2024

4. New Spark solutions for distributed frequent itemset and association rule mining algorithms;Cluster Computing;2023-04-30

5. Development of Surface Mining 4.0 in Terms of Technological Shock in Energy Transition: A Review;Energies;2023-04-24