Affiliation:
1. University of Chile, Santiago, Chile
Abstract
We introduce a compression technique for suffix arrays. It is sensitive to the compressibility of the text and
local
, meaning that random portions of the suffix array can be decompressed by accessing mostly contiguous memory areas. This makes decompression very fast, especially when various contiguous cells must be accessed.
Our main technical contributions are the following. First, we show that runs of consecutive values that are known to appear in function Ψ(
i
) =
A
−1
[
A
[
i
] + 1] of suffix arrays
A
of compressible texts also show up as repetitions in the differential suffix array
A
'[
i
] =
A
[
i
] −
A
[
i
−1]. Second, we use Re-Pair, a grammar-based compressor, to compress the differential suffix array, and upper bound its compression ratio in terms of the number of runs. Third, we show how to compact the space used by the grammar rules by up to 50%, while still permitting direct access to the rules. Fourth, we develop specific variants of Re-Pair that work using knowledge of Ψ, and use much less space than the general Re-Pair compressor, while achieving almost the same compression ratios. Fifth, we implement the scheme and compare it exhaustively with previous work, including the first implementations of previous theoretical proposals.
Funder
Fondo Nacional de Desarrollo Científico y Tecnológico
Publisher
Association for Computing Machinery (ACM)
Subject
Theoretical Computer Science
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献