ALGORITHMS FOR JUMBLED PATTERN MATCHING IN STRINGS

Author:

BURCSI PÉTER1,CICALESE FERDINANDO2,FICI GABRIELE3,LIPTÁK ZSUZSANNA4

Affiliation:

1. Department of Computer Algebra, Faculty of Informatics, Eötvös Loránd University, 1/c Pázmány Péter sétány, H-1117 Budapest, Hungary

2. Dipartimento di Informatica ed Applicazioni, University of Salerno, Via Ponte don Melillo, 84084 Fisciano (SA), Italy

3. Laboratoire I3S - CNRS/Université de Nice-Sophia Antipolis, 2000 route des lucioles, 06903 Sophia Antipolis, France

4. AG Genominformatik, Technische Fakultät, Bielefeld University, Postfach 100131, 33501, Bielefeld, Germany

Abstract

The Parikh vector p(s) of a string s over a finite ordered alphabet Σ = {a1, …, aσ} is defined as the vector of multiplicities of the characters, p(s) = (p1, …, pσ), where pi = |{j | sj = ai}|. Parikh vector q occurs in s if s has a substring t with p(t) = q. The problem of searching for a query q in a text s of length n can be solved simply and worst-case optimally with a sliding window approach in O(n) time. We present two novel algorithms for the case where the text is fixed and many queries arrive over time. The first algorithm only decides whether a given Parikh vector appears in a binary text. It uses a linear size data structure and decides each query in O(1) time. The preprocessing can be done trivially in Θ(n2) time. The second algorithm finds all occurrences of a given Parikh vector in a text over an arbitrary alphabet of size σ ≥ 2 and has sub-linear expected time complexity. More precisely, we present two variants of the algorithm, both using an O(n) size data structure, each of which can be constructed in O(n) time. The first solution is very simple and easy to implement and leads to an expected query time of [Formula: see text], where m = ∑i qi is the length of a string with Parikh vector q. The second uses wavelet trees and improves the expected runtime to [Formula: see text], i.e., by a factor of log m.

Publisher

World Scientific Pub Co Pte Lt

Subject

Computer Science (miscellaneous)

Cited by 43 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Dyck Words, Lattice Paths, and Abelian Borders;International Journal of Foundations of Computer Science;2022-04

2. On infinite prefix normal words;Theoretical Computer Science;2021-03

3. Fast algorithms for single and multiple pattern Cartesian tree matching;Theoretical Computer Science;2021-01

4. Weighted Prefix Normal Words: Mind the Gap;Developments in Language Theory;2021

5. Finding patterns and periods in Cartesian tree matching;Theoretical Computer Science;2020-12

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3