Affiliation:
1. Technische Universität Berlin, Berlin, Germany
2. Universität Mannheim, Mannheim, Germany
3. Universität Bayreuth, Bayreuth, Germany
Abstract
Frequent sequence mining methods often make use of constraints to control which subsequences should be mined. A variety of such subsequence constraints has been studied in the literature, including length, gap, span, regular-expression, and hierarchy constraints. In this article, we show that many subsequence constraints—including and beyond those considered in the literature—can be unified in a single framework. A unified treatment allows researchers to study jointly many types of subsequence constraints (instead of each one individually) and helps to improve usability of pattern mining systems for practitioners. In more detail, we propose a set of simple and intuitive “pattern expressions” to describe subsequence constraints and explore algorithms for efficiently mining frequent subsequences under such general constraints. Our algorithms translate pattern expressions to succinct finite-state transducers, which we use as computational model, and simulate these transducers in a way suitable for frequent sequence mining. Our experimental study on real-world datasets indicates that our algorithms—although more general—are efficient and, when used for sequence mining with prior constraints studied in literature, competitive to (and in some cases superior to) state-of-the-art specialized methods.
Publisher
Association for Computing Machinery (ACM)
Reference51 articles.
1. Mining frequent sequential patterns under regular expressions: a highly adaptative strategy for pushing constraints
2. OpenFst: A general and efficient weighted finite-state transducer library. In Implementation and Application of Automata;Allauzen Cyril;Springer.,2007
3. Marco Almeida Nelma Moreira and Rogério Reis. 2007. On the performance of automata minimization algorithms. Technical report DCC-2007-03. Universidade do Porto. Marco Almeida Nelma Moreira and Rogério Reis. 2007. On the performance of automata minimization algorithms. Technical report DCC-2007-03. Universidade do Porto.
4. An Efficient Algorithm for Mining Frequent Sequence with Constraint Programming
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献