Abstract
AbstractStatistically significant pattern mining (SSPM) is to mine patterns with significance based on hypothesis test. Under the constraint of statistical significance, our study aims to introduce a new preference relation into high utility patterns and to discover high utility and significant patterns (HUSPs) from transaction datasets, which has never been considered in existing SSPM problems. Our approach can be divided into two parts, HUSP-Mining and HUSP-Test. HUSP-Mining looks for HUSP candidates and HUSP-Test tests their significance. HUSP-Mining is not outputting all high utility itemsets (HUIs) as HUSP candidates; it is established based on candidate length and testable support requirements which can remove many insignificant HUIs early in the mining process; compared with the traditional HUIs mining algorithm, it can get candidates in a short time without losing the real HUSPs. HUSP-Test is to draw significant patterns from the results of HUSP-Mining based on Fisher’s test. We propose an iterative multiple testing procedure, which can alternately and efficiently reject a hypothesis and safely ignore the hypotheses that have less utility than the rejected hypothesis. HUSP-Test controls Family-wise Error Rate (FWER) under a user-defined threshold by correcting the test level which can find more HUSPs than standard Bonferroni’s control. Substantial experiments on real datasets show that our algorithm can draw HUSPs efficiently from transaction datasets with strong mathematical guarantee.
Funder
Zhejiang Province Public Welfare Technology Application Research Project
Natural Science Foundation of Zhejiang Province
Publisher
Springer Science and Business Media LLC
Subject
Computational Mathematics,General Computer Science
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献