ASCF: Optimization of the Apriori Algorithm Using Spark-Based Cuckoo Filter Structure-Reference-Cited by-同舟云学术

ASCF: Optimization of the Apriori Algorithm Using Spark-Based Cuckoo Filter Structure

Published:2024-01-22 Issue: Volume:2024 Page:1-16
ISSN:1098-111X
Container-title:International Journal of Intelligent Systems
language:en
Short-container-title:International Journal of Intelligent Systems

Author:

Alrahwan Bana Ahmad¹^ORCID,Farouk Mona¹^ORCID

Affiliation:

1. Computer Engineering Department, Faculty of Engineering, Cairo University, Cairo, Egypt

Abstract

Data mining is the process used for extracting hidden patterns from large databases using a variety of techniques. For example, in supermarkets, we can discover the items that are often purchased together and that are hidden within the data. This helps make better decisions which improve the business outcomes. One of the techniques that are used to discover frequent patterns in large databases is frequent itemset mining (FIM) that is a part of association rule mining (ARM). There are different algorithms for mining frequent itemsets. One of the most common algorithms for this purpose is the Apriori algorithm that deduces association rules between different objects which describe how these objects are related together. It can be used in different application areas like market basket analysis, student’s courses selection process in the E-learning platforms, stock management, and medical applications. Nowadays, there is a great explosion of data that will increase the computational time in the Apriori algorithm. Therefore, there is a necessity to run the data-intensive algorithms in a parallel-distributed environment to achieve a convenient performance. In this paper, optimization of the Apriori algorithm using the Spark-based cuckoo filter structure (ASCF) is introduced. ASCF succeeds in removing the candidate generation step from the Apriori algorithm to reduce computational complexity and avoid costly comparisons. It uses the cuckoo filter structure to prune the transactions by reducing the number of items in each transaction. The proposed algorithm is implemented on the Spark in-memory processing distributed environment to reduce processing time. ASCF offers a great improvement in performance over the other candidate algorithms based on Apriori, where it achieves a time of only 5.8% of the state-of-the-art approach on the retail dataset with a minimum support of 0.75%.

Publisher

Hindawi Limited

Link

http://downloads.hindawi.com/journals/ijis/2024/8781318.pdf

Reference48 articles.

1. Data never sleeps 2.0;J. James,2014

2. Phursule: survey paper on big data;V. Chavan;International Journal of Computer Science and Information Technologies,2014

3. Issues, Challenges and Solutions : Big Data Mining

4. A Survey of Big Data Analytics

5. A Survey on Pre-processing and Post-processing Techniques in Data Mining

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing Forest Fire Risk Assessment: An Ontology-Based Approach with Improved Continuous Apriori Algorithm;Forests;2024-05-31