Affiliation:
1. Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211, China
2. Faculty of Finance and Information, Ningbo University of Finance & Economics, Ningbo 315175, China
Abstract
Mining high-utility pattern (HUP) on transactional datasets has been widely discussed, and various algorithms have been introduced to settle this problem. However, the time-space efficiency of the algorithms is still limited, and the mining system cannot provide timely feedback on relevant information. In addition, when mining HUP from taxonomy transactional datasets, a large portion of the quantitative results are just accidental responses to the user-defined utility constraints, and they may have no statistical significance. To address these two problems, we propose two corresponding approaches named Sampling HUP-Miner and Significant HUP-Miner. Sampling HUP-Miner pursues a sample size of a transitional dataset based on a theoretical guarantee; the mining results based on such a sample size can be an effective approximation to the results on the whole datasets. Significant HUP-Miner proposes the concept of testable support, and significant HUPs could be drawn timely based on the constraint of testable support. Experiments show that the designed two algorithms can discover approximate and significant HUPs smoothly and perform well according to the runtime, pattern numbers, memory usage, and average utility.