Compressed full-text indexes-Reference-Cited by-同舟云学术

Compressed full-text indexes

Published:2007-04-12 Issue:1 Volume:39 Page:2
ISSN:0360-0300
Container-title:ACM Computing Surveys
language:en
Short-container-title:ACM Comput. Surv.

Author:

Navarro Gonzalo¹,Mäkinen Veli²

Affiliation:

1. University of Chile, Santiago, Chile

2. University of Helsinki, Helsinki, Finland

Abstract

Full-text indexes provide fast substring search over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes that exploit the compressibility of the text, so that their size is a function of the compressed text length. This concept has evolved into self-indexes , which in addition contain enough information to reproduce any text portion, so they replace the text. The exciting possibility of an index that takes space close to that of the compressed text, replaces it, and in addition provides fast search over it, has triggered a wealth of activity and produced surprising results in a very short time, which radically changed the status of this area in less than 5 years. The most successful indexes nowadays are able to obtain almost optimal space and search time simultaneously. In this article we present the main concepts underlying (compressed) self-indexes. We explain the relationship between text entropy and regularities that show up in index structures and permit compressing them. Then we cover the most relevant self-indexes, focusing on how they exploit text compressibility to achieve compact structures that can efficiently solve various search problems. Our aim is to give the background to understand and follow the developments in this area.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science,Theoretical Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/1216370.1216372

Reference116 articles.

1. Replacing suffix trees with enhanced suffix arrays

2. Aluru S. 2005. Handbook of Computational Molecular Biology. CRC Press Boca Raton FL.]] Aluru S. 2005. Handbook of Computational Molecular Biology. CRC Press Boca Raton FL.]]

3. Efficient implementation of suffix trees

4. NATO ISI Series;Apostolico A.

Cited by 499 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Accelerating range minimum queries with ray tracing cores;Future Generation Computer Systems;2024-08

2. Whole-Genome Alignment: Methods, Challenges, and Future Directions;Applied Sciences;2024-06-03

3. r-indexing the eBWT;Information and Computation;2024-06

4. Constructing and indexing the bijective and extended Burrows–Wheeler transform;Information and Computation;2024-03

5. CoCo-trie: Data-aware compression and indexing of strings;Information Systems;2024-02