Affiliation:
1. Univ. of Arizona, Tucson
Abstract
The approximate string matching problem is to find all locations at which a query of length
m
matches a substring of a text of length
n
with
k
-or-fewer differences. Simple and practical bit-vector algorithms have been designed for this problem, most notably the one used in
agrep
. These algorithms compute a bit representation of the current state-set of the
k
-difference automaton for the query, and asymptotically run in either
O
(
nm/w
) or
O
(
nm
log σ/
w
) time where
w
is the word size of the machine (e.g., 32 or 64 in practice), and σ is the size of the pattern alphabet. Here we present an algorithm of comparable simplicity that requires only
O
(
nm/w)
time by virtue of computing a bit representation of the
relocatable
dynamic programming matrix for the problem. Thus, the algorithm's performance is independent of
k
, and it is found to be more efficient than the previous results for many choices of
k
and
small
m
.
Moreover, because the algorithm is not
dependent on
k
, it can be used to rapidly compute blocks of the dynamic programming matrix as in the 4-Russians algorithm of Wu et al.(1996). This gives rise to an
O(kn/w)
expected-time algorithm for the case where
m
may be arbitrarily large. In practice this new algorithm, that computes a region of the dynamic progr amming (d.p.) matrx
w
entries at a time using the basic algorithm as a subroutine is significantly faster than our previous 4-Russians algorithm, that computes the same region 4 or 5 entries at a time using table lookup. This performance improvement yields a code that is either superior or competitive with
all
existing algorithms except for some filtration algorithms that are superior when
k/m
is sufficiently small.
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Hardware and Architecture,Information Systems,Control and Systems Engineering,Software
Reference23 articles.
1. A new approach to text searching
2. Lecture Notes in Computer Science;BAEZA-YATES R. A.
3. BAEZA-YATES R.A. AND NAVARRO G. 1999. Analysis for algorithm engineering: Improving an algorithm for approximate pattern matching. Unpublished manuscript. BAEZA-YATES R.A. AND NAVARRO G. 1999. Analysis for algorithm engineering: Improving an algorithm for approximate pattern matching. Unpublished manuscript.
4. Recent developments in linear-space alignment methods: A survey;CHAO K.M.;J. Comput. Biol.,1992
5. Lecture Notes in Computer Science;CHANG W. I.
Cited by
299 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献