String Indexing with Compressed Patterns


Bille Philip1ORCID,Gørtz Inge Li1ORCID,Steiner Teresa Anna1ORCID


1. Technical University of Denmark, DTU Compute, Denmark


Given a string S of length n , the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In this article, we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way, we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern.


Danish Research Council



Association for Computing Machinery (ACM)


Mathematics (miscellaneous)

Reference45 articles.

1. Stephen Alstrup, Thore Husfeldt, and Theis Rauhe. 1998. Marked ancestor problems. In Proc. 39th FOCS. 534–543.

2. Djamal Belazzougui, Paolo Boldi, and Sebastiano Vigna. 2010. Dynamic Z-fast tries. In Proc. 17th SPIRE. 159–172.

3. Alphabet-Independent Compressed Text Indexing

4. Time–space trade-offs for Lempel–Ziv compressed indexing

5. Philip Bille, Inge Li Gørtz, Mathias Bæk Tejs Knudsen, Moshe Lewenstein, and Hjalte Wedel Vildhøj. 2015. Longest common extensions in sublinear space. In Proc. 26th CPM. 65–76.







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3