ALGORITHMS FOR JUMBLED PATTERN MATCHING IN STRINGS-Reference-Cited by-同舟云学术

ALGORITHMS FOR JUMBLED PATTERN MATCHING IN STRINGS

Published:2012-02 Issue:02 Volume:23 Page:357-374
ISSN:0129-0541
Container-title:International Journal of Foundations of Computer Science
language:en
Short-container-title:Int. J. Found. Comput. Sci.

Author:

BURCSI PÉTER¹,CICALESE FERDINANDO²,FICI GABRIELE³,LIPTÁK ZSUZSANNA⁴

Affiliation:

1. Department of Computer Algebra, Faculty of Informatics, Eötvös Loránd University, 1/c Pázmány Péter sétány, H-1117 Budapest, Hungary

2. Dipartimento di Informatica ed Applicazioni, University of Salerno, Via Ponte don Melillo, 84084 Fisciano (SA), Italy

3. Laboratoire I3S - CNRS/Université de Nice-Sophia Antipolis, 2000 route des lucioles, 06903 Sophia Antipolis, France

4. AG Genominformatik, Technische Fakultät, Bielefeld University, Postfach 100131, 33501, Bielefeld, Germany

Abstract

The Parikh vector p(s) of a string s over a finite ordered alphabet Σ = {a1, …, aσ} is defined as the vector of multiplicities of the characters, p(s) = (p1, …, pσ), where pi = |{j | sj = ai}|. Parikh vector q occurs in s if s has a substring t with p(t) = q. The problem of searching for a query q in a text s of length n can be solved simply and worst-case optimally with a sliding window approach in O(n) time. We present two novel algorithms for the case where the text is fixed and many queries arrive over time. The first algorithm only decides whether a given Parikh vector appears in a binary text. It uses a linear size data structure and decides each query in O(1) time. The preprocessing can be done trivially in Θ(n2) time. The second algorithm finds all occurrences of a given Parikh vector in a text over an arbitrary alphabet of size σ ≥ 2 and has sub-linear expected time complexity. More precisely, we present two variants of the algorithm, both using an O(n) size data structure, each of which can be constructed in O(n) time. The first solution is very simple and easy to implement and leads to an expected query time of [Formula: see text], where m = ∑i qi is the length of a string with Parikh vector q. The second uses wavelet trees and improves the expected runtime to [Formula: see text], i.e., by a factor of log m.

Publisher

World Scientific Pub Co Pte Lt

Subject

Computer Science (miscellaneous)

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0129054112400175

Reference18 articles.

1. Efficient text fingerprinting via Parikh mapping

2. The Boyer–Moore–Galil String Searching Strategies Revisited

3. Sequencing from Compomers: Using Mass Spectrometry for DNAde novoSequencing of 200+ nt

4. Simulating multiplexed SNP discovery rates using base-specific cleavage and mass spectrometry

5. A fast string searching algorithm

Cited by 43 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Dyck Words, Lattice Paths, and Abelian Borders;International Journal of Foundations of Computer Science;2022-04

2. On infinite prefix normal words;Theoretical Computer Science;2021-03

3. Fast algorithms for single and multiple pattern Cartesian tree matching;Theoretical Computer Science;2021-01

4. Weighted Prefix Normal Words: Mind the Gap;Developments in Language Theory;2021

5. Finding patterns and periods in Cartesian tree matching;Theoretical Computer Science;2020-12