A Note on Exponential Length Substrings in Pattern Matching


Ulmeanu Vlad-Adrian1


1. Polytechnic University of Bucharest


Abstract This note describes a hash-based mass-searching algorithm, finding (count, location of first match) entries from a dictionary against a string \(s\) of length \(n\) . The presented implementation makes use of all substrings of $s$ whose lengths are powers of \(2\) to construct an offline algorithm that can, in some cases, reach a complexity of \(O(n \log^2n)\) even if there are \(O(n^2)\) possible matches. If there is a limit on the dictionary size \(m\) , then the precalculation complexity is \(O(m + n \log^2n)\) , and the search complexity is bounded by \(O(\min (n \sqrt m \log n, \, n^2 \log n))\) , even if it performs better in practice.


Research Square Platform LLC

Reference7 articles.

1. Karp, Richard M. and Rabin, Michael O. (1987) Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 31(2): 249-260 https://doi.org/10.1147/rd.312.0249

2. Aho, Alfred V. and Corasick, Margaret J. (1975) Efficient String Matching: An Aid to Bibliographic Search. Commun. ACM 18(6): 333 –340 https://doi.org/10.1145/360825.360855, finite state machines, string pattern matching, information retrieval, bibliographic search, text-editing, keywords and phrases, computational complexity, 8, jun, This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10., https://doi.org/10.1145/360825.360855, 0001-0782, New York, NY, USA, Association for Computing Machinery, June 1975

3. Esko Ukkonen. On-Line Construction of Suffix Trees. 1995

4. Manber, Udi and Myers, Gene (1990) Suffix Arrays: A New Method for on-Line String Searches. Society for Industrial and Applied Mathematics, USA, SODA '90, San Francisco, California, USA, 9, 319 –327, Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, 0898712513

5. Blumer, A. and Blumer, J. and Ehrenfeucht, A. and Haussler, D. and McConnell, R. (1984) Building the minimal DFA for the set of all subwords of a word on-line in linear time. Springer Berlin Heidelberg, Berlin, Heidelberg, 978-3-540-38886-9, Let a partial deterministic finite automaton be a DFA in which each state need not have a transition edge for each letter of the alphabet. We demonstrate that the minimal partial DFA for the set of all subwords of a given word w, |w| > 2, has at most 2|w| â ˆ ’ 2 states and 3|w| â ˆ ’ 4 transition edges, independently of the alphabet size. We give an algorithm to build this minimal partial DFA from the input w on-line in linear time., 109--118, Automata, Languages and Programming, Paredaens, Jan








Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3