A Note on Exponential Length Substrings in Pattern Matching

Author:

Ulmeanu Vlad-Adrian1

Affiliation:

1. Polytechnic University of Bucharest

Abstract

Abstract This note describes a hash-based mass-searching algorithm, finding (count, location of first match) entries from a dictionary against a string \(s\) of length \(n\) . The presented implementation makes use of all substrings of $s$ whose lengths are powers of \(2\) to construct an offline algorithm that can, in some cases, reach a complexity of \(O(n \log^2n)\) even if there are \(O(n^2)\) possible matches. If there is a limit on the dictionary size \(m\) , then the precalculation complexity is \(O(m + n \log^2n)\) , and the search complexity is bounded by \(O(\min (n \sqrt m \log n, \, n^2 \log n))\) , even if it performs better in practice.

Publisher

Research Square Platform LLC

Reference7 articles.

1. Karp, Richard M. and Rabin, Michael O. (1987) Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 31(2): 249-260 https://doi.org/10.1147/rd.312.0249

2. Aho, Alfred V. and Corasick, Margaret J. (1975) Efficient String Matching: An Aid to Bibliographic Search. Commun. ACM 18(6): 333 –340 https://doi.org/10.1145/360825.360855, finite state machines, string pattern matching, information retrieval, bibliographic search, text-editing, keywords and phrases, computational complexity, 8, jun, This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10., https://doi.org/10.1145/360825.360855, 0001-0782, New York, NY, USA, Association for Computing Machinery, June 1975

3. Esko Ukkonen. On-Line Construction of Suffix Trees. 1995

4. Manber, Udi and Myers, Gene (1990) Suffix Arrays: A New Method for on-Line String Searches. Society for Industrial and Applied Mathematics, USA, SODA '90, San Francisco, California, USA, 9, 319 –327, Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, 0898712513

5. Blumer, A. and Blumer, J. and Ehrenfeucht, A. and Haussler, D. and McConnell, R. (1984) Building the minimal DFA for the set of all subwords of a word on-line in linear time. Springer Berlin Heidelberg, Berlin, Heidelberg, 978-3-540-38886-9, Let a partial deterministic finite automaton be a DFA in which each state need not have a transition edge for each letter of the alphabet. We demonstrate that the minimal partial DFA for the set of all subwords of a given word w, |w| > 2, has at most 2|w| â ˆ ’ 2 states and 3|w| â ˆ ’ 4 transition edges, independently of the alphabet size. We give an algorithm to build this minimal partial DFA from the input w on-line in linear time., 109--118, Automata, Languages and Programming, Paredaens, Jan

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3