PUC: parallel mining of high-utility itemsets with load balancing on spark-Reference-Cited by-同舟云学术

PUC: parallel mining of high-utility itemsets with load balancing on spark

Published:2022-01-01 Issue:1 Volume:31 Page:568-588
ISSN:2191-026X
Container-title:Journal of Intelligent Systems
language:en
Short-container-title:

Author:

Brahmavar Anup Bhat¹,Sheeranalli Venkatarama Harish¹,Maiya Geetha¹

Affiliation:

1. Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education , Manipal , Karnataka , India

Abstract

Abstract Distributed programming paradigms such as MapReduce and Spark have alleviated sequential bottleneck while mining of massive transaction databases. Of significant importance is mining High Utility Itemset (HUI) that incorporates the revenue of the items purchased in a transaction. Although a few algorithms to mine HUIs in the distributed environment exist, workload skew and data transfer overhead due to shuffling operations remain major issues. In the current study, Parallel Utility Computation (PUC) algorithm has been proposed with novel grouping and load balancing strategies for an efficient mining of HUIs in a distributed environment. To group the items, Transaction Weighted Utility (TWU) values as a degree of transaction similarity is employed. Subsequently, these groups are assigned to the nodes across the cluster by taking into account the mining load due to the items in the group. Experimental evaluation on real and synthetic datasets demonstrate that PUC with TWU grouping in conjunction with load balancing converges mining faster. Due to reduced data transfer, and load balancing-based assignment strategy, PUC outperforms different grouping strategies and random assignment of groups across the cluster. Also, PUC is shown to be faster than PHUI-Growth algorithm with a promising speedup.

Publisher

Walter de Gruyter GmbH

Subject

Artificial Intelligence,Information Systems,Software

Link

https://www.degruyter.com/document/doi/10.1515/jisys-2022-0044/pdf

Reference42 articles.

1. Gartner SW. 3 steps to get the most from customer data. 2017. https://www.gartner.com/smarterwithgartner/3-steps-to-get-the-most-from-customer-data/. Accessed: 2021-03-26.

2. Tran T, Vo B, Le TTN, Nguyen NT. Text clustering using frequent weighted utility itemsets. Cybern. Syst. 2017;48(3):193–209. 10.1080/01969722.2016.1276774.

3. Djenouri Y, Belhadi A, Fournier-Viger P, Lin JC. Fast and effective cluster-based information retrieval using frequent closed itemsets. Inf Sci 2018;453:154–67, 10.1016/j.ins.2018.04.008.

4. Naulaerts S, Meysman P, Bittremieux W, Vu TN, Berghe W, Goethals B, et al. A primer to frequent itemset mining for bioinformatics. Brief Bioinform. 2015;16(2):216–31. 10.1093/bib/bbt074.

5. Henriques R, Ferreira FL, Madeira SC. Bicpams: software for biological data analysis with pattern-based biclustering. BMC Bioinform. 2017;18(1):1–6. 10.1186/s12859-017-1493-3.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A fast and highly scalable frequent pattern mining algorithm;Future Generation Computer Systems;2024-11