Affiliation:
1. University of Hong Kong
2. National Tsing Hua University
3. Kyushu University
Abstract
Let
T
be a string with
n
characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for
T
in optimal space (i.e.,
O
(
n
) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically.
This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the
library management
problem, where we show an index of
O
(
n
) bits for a text collection
L
of total length
n
, which can be updated in
O
(|
T
| log
n
) time when a text
T
is inserted or deleted from
L
; also, the index supports searching the occurrences of any pattern
P
in all texts in
L
in
O
(|
P
| log
n
+
occ
log
2
n
) time, where
occ
is the number of occurrences.
Our second result is a compressed solution to the
dictionary matching
problem, where we show an index of
O
(
d
) bits for a pattern collection
D
of total length
d
, which can be updated in
O
(|
P
| log
2
d
) time when a pattern
P
is inserted or deleted from
D
; also, the index supports searching the occurrences of all patterns of
D
in any text
T
in
O
((|
T
| +
occ
)log
2
d
) time. When compared with the
O
(
d
log
d
)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log
d
only.
The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an
O
(
n
)-bit representation of a suffix tree for a dynamic collection of texts whose total length is
n
, which supports insertion and deletion of a text
T
in
O
(|
T
| log
2
n
) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first
O
(
n
)-bit representation for maintaining
n
pairs of balanced parentheses in
O
(log
n
/log log
n
) time per operation, matching the time complexity of the previous
O
(
n
log
n
)-bit solution.
Publisher
Association for Computing Machinery (ACM)
Subject
Mathematics (miscellaneous)
Cited by
62 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. AN IMPROVED INDEXING METHOD FOR QUERYING BIG XML FILES;Journal of Computer Science and Cybernetics;2023-12-25
2. Dynamic “Succincter”;2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS);2023-11-06
3. Compressed and queryable self-indexes for RDF archives;Knowledge and Information Systems;2023-08-29
4. Space/time-efficient RDF stores based on circular suffix sorting;The Journal of Supercomputing;2022-10-25
5. Dynamic suffix array with polylogarithmic queries and updates;Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing;2022-06-09