New Spark solutions for distributed frequent itemset and association rule mining algorithms-Reference-Cited by-同舟云学术

New Spark solutions for distributed frequent itemset and association rule mining algorithms

Published:2023-04-30 Issue: Volume: Page:
ISSN:1386-7857
Container-title:Cluster Computing
language:en
Short-container-title:Cluster Comput

Author:

Fernandez-Basso Carlos,Ruiz M. Dolores,Martin-Bautista Maria J.

Abstract

AbstractThe large amount of data generated every day makes necessary the re-implementation of new methods capable of handle with massive data efficiently. This is the case of Association Rules, an unsupervised data mining tool capable of extracting information in the form of IF-THEN patterns. Although several methods have been proposed for the extraction of frequent itemsets (previous phase before mining association rules) in very large databases, the high computational cost and lack of memory remains a major problem to be solved when processing large data. Therefore, the aim of this paper is three fold: (1) to review existent algorithms for frequent itemset and association rule mining, (2)to develop new efficient frequent itemset Big Data algorithms using distributive computation, as well as a new association rule mining algorithm in Spark, and (3) to compare the proposed algorithms with the existent proposals varying the number of transactions and the number of items. To this purpose, we have used the Spark platform which has been demonstrated to outperform existing distributive algorithmic implementations.

Funder

BIGDATAMED project Andalusian Government

EU-funded margarita salas programme NextGenerationEU

Universidad de Granada

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Software

Link

https://link.springer.com/content/pdf/10.1007/s10586-023-04014-w.pdf

Reference62 articles.

1. Wu, X., Zhu, X., Wu, G.-Q., Ding, W.: Data mining with big data. Knowl. Data Eng. IEEE Trans. 26(1), 97–107 (2014)

2. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

3. Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., et al.: Mllib: machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)

4. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)

5. Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Know. Data Eng. 12(3), 372–390 (2000)

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A scalable and flexible basket analysis system for big transaction data in Spark;Information Processing & Management;2024-03

2. An AI knowledge‐based system for police assistance in crime investigation;Expert Systems;2024-01-09

3. A big data association rule mining based approach for energy building behaviour analysis in an IoT environment;Scientific Reports;2023-11-13

4. Apriori Algorithm and Hybrid Apriori Algorithm in the Data Mining: A Comprehensive Review;E3S Web of Conferences;2023