Solving String Problems on Graphs Using the Labeled Direct Product
-
Published:2022-07-05
Issue:10
Volume:84
Page:3008-3033
-
ISSN:0178-4617
-
Container-title:Algorithmica
-
language:en
-
Short-container-title:Algorithmica
Author:
Rizzo NicolaORCID, Tomescu Alexandru I.ORCID, Policriti Alberto
Abstract
AbstractSuffix trees are an important data structure at the core of optimal solutions to many fundamental string problems, such as exact pattern matching, longest common substring, matching statistics, and longest repeated substring. Recent lines of research focused on extending some of these problems to vertex-labeled graphs, either by using efficient ad-hoc approaches which do not generalize to all input graphs, or by indexing difficult graphs and having worst-case exponential complexities. In the absence of an ubiquitous and polynomial tool like the suffix tree for labeled graphs, we introduce the labeled direct product of two graphs as a general tool for obtaining optimal algorithms in the worst case: we obtain conceptually simpler algorithms for the quadratic problems of string matching () and longest common substring () in labeled graphs. Our algorithms run in time linear in the size of the labeled product graph, which may be smaller than quadratic for some inputs, and their run-time is predictable, because the size of the labeled direct product graph can be precomputed efficiently. We also solve on graphs containing cycles, which was left as an open problem by Shimohira et al. in 2011. To show the power of the labeled product graph, we also apply it to solve the matching statistics () and the longest repeated string () problems in labeled graphs. Moreover, we show that our (worst-case quadratic) algorithms are also optimal, conditioned on the Orthogonal Vectors Hypothesis. Finally, we complete the complexity picture around by studying it on undirected graphs.
Funder
H2020 European Research Council Academy of Finland
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,General Computer Science
Reference42 articles.
1. Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Structuring labeled trees for optimal succinctness, and beyond. In: 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2005), 23-25 October 2005, Pittsburgh, PA, USA, Proceedings, pp. 184–196. IEEE Computer Society, (2005). https://doi.org/10.1109/SFCS.2005.69 2. Garrison, E., Sirén, J., Novak, A.M., Hickey, G., Eizenga, J.M., Dawson, E.T., Jones, W., Garg, S., Markello, C., Lin, M.F., Paten, B., Durbin, R.: Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875 (2018). https://doi.org/10.1038/nbt.422710.1038/nbt.4227 3. Schneeberger, K., Hagmann, J., Ossowski, S., Warthmann, N., Gesing, S., Kohlbacher, O., Weigel, D.: Simultaneous alignment of short reads against multiple genomes. Genome Biol. 10, 98 (2009) 4. Akutsu, T.: A linear time pattern matching algorithm between a string and a tree. In: 4th Symposium on Combinatorial Pattern Matching, Padova, Italy, pp. 1–10 (1993) 5. Backurs, A., Indyk, P.: Which regular expression patterns are hard to match? In: IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pp. 457–466 (2016)
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|