Affiliation:
1. Hebei University of Technology, China
2. Dalian University of Technology, China
3. Hefei University of Technology, China
4. University of Vermont, Vermont, USA
Abstract
Pattern matching is a key issue in sequential pattern mining. Many researchers now focus on pattern matching with gap constraints. However, most of these studies involve exact pattern matching problems, a special case of approximate pattern matching and a more challenging task. In this study, we introduce an approximate pattern matching problem with Hamming distance. Its objective is to compute the number of approximate occurrences of pattern P with gap constraints in sequence S under similarity constraint d. We propose an efficient algorithm named Single-rOot Nettree for approximate pattern matchinG with gap constraints (SONG) based on a new non-linear data structure Single-root Nettree to effectively solve the problem. Theoretical analysis and experiments demonstrate an interesting law that the ratio M( P, S, d)/ N( P, S, m) approximately follows a binomial distribution, where M( P, S, d) and N( P, S, m) are the numbers of the approximate occurrences whose distances to pattern P are d (0≤ d≤ m) and no more than m (the length of pattern P), respectively. Experimental results for real biological data validate the efficiency and effectiveness of SONG.
Subject
Library and Information Sciences,Information Systems
Cited by
19 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献