Author:
Yasir Muhammad,Asif Habib Muhammad,Sarwar Shahzad,Nadeem Faisal Chaudhry Muhammad,Ahmad Mudassar,Jabbar Sohail
Abstract
Modern algorithms for mining frequent itemsets face noteworthy deterioration of performance when minimum support tends to decrease, especially for sparse datasets. Long-tailed itemsets, frequent itemsets found at lower minimum support, are significant for present-day applications such as recommender systems. In this study, we have developed a novel power set based method named as HARnessing the Power of Power sets (HARPP) for mining frequent itemsets. HARPP iteratively generates power sets to make combinations of overlapping varying-sized subsets of I, where I is a set of items in a large database. Intrinsic feature of creating power sets along with the use of set data structure ensures the agility of HARPP because most of its operations take constant running time. Without storing it entirely in memory, HARPP scans the dataset only once and mines frequent itemsets on the fly. In contrast to state-of-the-art, efficiency of HARPP increases with decrease in minimum support that makes it a viable technique for mining long-tailed itemsets. Performance study shows that HARPP is efficient and scalable, and is faster up to two orders of magnitude than FP-Growth algorithm at lower minimum support particularly when datasets are sparse.
Publisher
Kaunas University of Technology (KTU)
Subject
Electrical and Electronic Engineering,Computer Science Applications,Control and Systems Engineering
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献