A Unified Framework for Frequent Sequence Mining with Subsequence Constraints

Author:

Beedkar Kaustubh1,Gemulla Rainer2,Martens Wim3

Affiliation:

1. Technische Universität Berlin, Berlin, Germany

2. Universität Mannheim, Mannheim, Germany

3. Universität Bayreuth, Bayreuth, Germany

Abstract

Frequent sequence mining methods often make use of constraints to control which subsequences should be mined. A variety of such subsequence constraints has been studied in the literature, including length, gap, span, regular-expression, and hierarchy constraints. In this article, we show that many subsequence constraints—including and beyond those considered in the literature—can be unified in a single framework. A unified treatment allows researchers to study jointly many types of subsequence constraints (instead of each one individually) and helps to improve usability of pattern mining systems for practitioners. In more detail, we propose a set of simple and intuitive “pattern expressions” to describe subsequence constraints and explore algorithms for efficiently mining frequent subsequences under such general constraints. Our algorithms translate pattern expressions to succinct finite-state transducers, which we use as computational model, and simulate these transducers in a way suitable for frequent sequence mining. Our experimental study on real-world datasets indicates that our algorithms—although more general—are efficient and, when used for sequence mining with prior constraints studied in literature, competitive to (and in some cases superior to) state-of-the-art specialized methods.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems

Reference51 articles.

1. Mining frequent sequential patterns under regular expressions: a highly adaptative strategy for pushing constraints

2. OpenFst: A general and efficient weighted finite-state transducer library. In Implementation and Application of Automata;Allauzen Cyril;Springer.,2007

3. Marco Almeida Nelma Moreira and Rogério Reis. 2007. On the performance of automata minimization algorithms. Technical report DCC-2007-03. Universidade do Porto. Marco Almeida Nelma Moreira and Rogério Reis. 2007. On the performance of automata minimization algorithms. Technical report DCC-2007-03. Universidade do Porto.

4. An Efficient Algorithm for Mining Frequent Sequence with Constraint Programming

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. NEW ARP;Encyclopedia of Data Science and Machine Learning;2022-10-14

2. Database Principles and Challenges in Text Analysis;ACM SIGMOD Record;2021-08-24

3. TSARM-UDP: An Efficient Time Series Association Rules Mining Algorithm Based on Up-to-Date Patterns;Entropy;2021-03-19

4. Formal Languages in Information Extraction and Graph Databases;Lecture Notes in Computer Science;2020

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3