Abstract
String search is fundamental in many text processing applications. Sunday recently gave several algorithms to find the first occurrence of a pattern string as a substring of a text, providing experimental data from searches in a text of about 200K characters to support his claim that his algorithms are faster than the standard Boyer-Moore algorithm.
We present a methodology for the average-case analysis of the performance of string search algorithms---for such algorithms, a worst-case analysis does not yield much useful information, since the performance of the algorithm is directly affected by such characteristics as the size of the character set, the character frequencies, and the structure of the text. Knuth described a finite automaton which can be used to save information about character comparisons. Baeza-Yates, Gonnet, and Regnier gave a probabilistic analysis of the worst- and average-case behavior of a string search algorithm based upon such an automaton. We construct Knuth automata to model Sunday's algorithms and use the methods of Baeza-Yates et al. to obtain an average-case analysis which confirms Sunday's experimental data.
Publisher
Association for Computing Machinery (ACM)
Subject
Theoretical Computer Science
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献