Compressed text indexes-Reference-Cited by-同舟云学术

Compressed text indexes

Published:2009-02 Issue: Volume:13 Page:
ISSN:1084-6654
Container-title:ACM Journal of Experimental Algorithmics
language:en
Short-container-title:ACM J. Exp. Algorithmics

Author:

Ferragina Paolo¹,González Rodrigo²,Navarro Gonzalo²,Venturini Rossano¹

Affiliation:

1. University of Pisa

2. University of Chile

Abstract

A compressed full-text self-index represents a text in a compressed form and still answers queries efficiently. This represents a significant advancement over the (full-)text indexing techniques of the previous decade, whose indexes required several times the size of the text. Although it is relatively new, this algorithmic technology has matured up to a point where theoretical research is giving way to practical developments. Nonetheless this requires significant programming skills, a deep engineering effort, and a strong algorithmic background to dig into the research results. To date only isolated implementations and focused comparisons of compressed indexes have been reported, and they missed a common API, which prevented their re-use or deployment within other applications. The goal of this article is to fill this gap. First, we present the existing implementations of compressed indexes from a practitioner's point of view. Second, we introduce the Pizza&Chili site, which offers tuned implementations and a standardized API for the most successful compressed full-text self-indexes, together with effective test-beds and scripts for their automatic validation and test. Third, we show the results of our extensive experiments on these codes with the aim of demonstrating the practical relevance of this novel algorithmic technology.

Funder

Millennium Nucleus Center for Web Research

Publisher

Association for Computing Machinery (ACM)

Subject

Theoretical Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/1412228.1455268

Reference62 articles.

1. Aluru S. and Ko P. 2008. Encyclopedia of Algorithms. Springer Chapter on “Lookup Tables Suffix Trees and Suffix Arrays”. Aluru S. and Ko P. 2008. Encyclopedia of Algorithms. Springer Chapter on “Lookup Tables Suffix Trees and Suffix Arrays”.

2. Efficient implementation of suffix trees

Cited by 64 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Homomorphic Compression: Making Text Processing on Compression Unlimited;Proceedings of the ACM on Management of Data;2023-12-08

2. Improving Low-Resource Chinese Named Entity Recognition Using Bidirectional Encoder Representation from Transformers and Lexicon Adapter;Applied Sciences;2023-09-27

3. DdERT: Research on Named Entity Recognition for Mine Hoist Using a Chinese BERT Model;Electronics;2023-09-26

4. CompressGraph: Efficient Parallel Graph Analytics with Rule-Based Compression;Proceedings of the ACM on Management of Data;2023-05-26

5. Text Indexing for Long Patterns: Anchors are All you Need;Proceedings of the VLDB Endowment;2023-05