Affiliation:
1. University of Ulm, Ulm, Germany
Abstract
The suffix array is arguably one of the most important data structures in sequence analysis and consequently there is a multitude of suffix sorting algorithms. However, to this date the
GSACA
algorithm introduced in 2015 is the only known non-recursive linear-time suffix array construction algorithm (SACA). Despite its interesting theoretical properties, there has been little effort in improving
GSACA
’s non-competitive real-world performance. There is a super-linear algorithm
DSH
, which relies on the same sorting principle and is faster than
DivSufSort
, the fastest SACA for over a decade. The purpose of this article is twofold: We analyse the sorting principle used in
GSACA
and
DSH
and exploit its properties to give an optimised linear-time algorithm, and we show that it can be very elegantly used to compute both the original extended Burrows-Wheeler transform (
eBWT
) and a bijective version of the Burrows-Wheeler transform (
BBWT
) in linear time. We call the algorithm “generic,” since it can be used to compute the regular suffix array and the variants used for the
BBWT
and
eBWT
. Our suffix array construction algorithm is not only significantly faster than
GSACA
but also outperforms
DivSufSort
and
DSH
. Our
BBWT
-algorithm is faster than or competitive with all other tested
BBWT
construction implementations on large or repetitive data, and our
eBWT
-algorithm is faster than all other programs on data that is not extremely repetitive.
Funder
Deutsche Forschungsgemeinschaft
Publisher
Association for Computing Machinery (ACM)
Reference33 articles.
1. Uwe Baier. 2015. Linear-time Suffix Sorting. Master’s Thesis. Ulm University.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献