Author:
Grabowski S.,Raniszewski M.
Abstract
AbstractFull-text indexing aims at building a data structure over a given text capable of efficiently finding arbitrary text patterns, and possibly requiring little space. We propose two suffix array inspired full-text indexes. One, called SA-hash, augments the suffix array with a hash table to speed up pattern searches due to significantly narrowed search interval before the binary search phase. The other, called FBCSA, is a compact data structure, similar to Mäkinen’s compact suffix array (MakCSA), but working on fixed size blocks. Experiments on the widely used Pizza & Chili datasets show that SA-hash is about 2–3 times faster in pattern searches (counts) than the standard suffix array, for the price of requiring 0.2n–1.1nbytes of extra space, wherenis the text length. FBCSA, in one of the presented variants, reduces the suffix array size by a factor of about 1.5–2, while it gets close in search times, winning in speed with its competitors known from the literature, MakCSA and LCSA.
Subject
Artificial Intelligence,Computer Networks and Communications,General Engineering,Information Systems,Atomic and Molecular Physics, and Optics
Reference25 articles.
1. Suffix arrays new method for on - line string searches st Discrete SODA;Manber;Annual SIAM Algorithms,1990
2. ary search on modern processors th Workshop on New Hardware;Schlegel;Int Data Management,2009
3. String search experimentation using massive data Philosophical Transactions of the of Mathematical Physical and;Moffat;Royal Society Engineering Sciences,2016
4. Optimized succinct data structures for massive data;Petri;Soft,2014
5. Suffix cactus cross between suffix tree and suffix array th;Kärkkäinen;Int Combinatorial Pattern Matching,1995
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献