Affiliation:
1. HIIT, Aalto University, KU Leuven and University of Antwerp, Espoo, Finland
2. Siemens Corporation USA
3. Université Libre de Bruxellés and Eindhoven University of Technology, Brussels, Belgium
Abstract
Mining frequent patterns is plagued by the problem of pattern explosion, making pattern reduction techniques a key challenge in pattern mining. In this article we propose a novel theoretical framework for pattern reduction by measuring the robustness of a property of an itemset such as closedness or nonderivability. The robustness of a property is the probability that this property holds on random subsets of the original data. We study four properties, namely an itemset being closed, free, non-derivable, or totally shattered, and demonstrate how to compute the robustness analytically without actually sampling the data. Our concept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do not assume a null hypothesis or any noise model and, in contrast to noise-tolerant or approximate patterns, the robust patterns for a given property are always a subset of the patterns with this property. If the underlying property is monotonic then the measure is also monotonic, allowing us to efficiently mine robust itemsets. We further derive a parameter-free technique for ranking itemsets that can be used for top-
k
approaches. Our experiments demonstrate that we can successfully use the robustness measure to reduce the number of patterns and that ranking yields interesting itemsets.
Funder
Fonds Wetenschappelijk Onderzoek
Publisher
Association for Computing Machinery (ACM)
Reference36 articles.
1. Mining association rules between sets of items in large databases
2. A. Asuncion and D. Newman. 2007. UCI machine learning repository. http://mlearn.ics.uci.edu/MLRepository.html. A. Asuncion and D. Newman. 2007. UCI machine learning repository. http://mlearn.ics.uci.edu/MLRepository.html.
3. Beyond market baskets
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献