Sparse Text Indexing in Small Space

Author:

Bille Philip1,Fischer Johannes2,Gørtz Inge Li1,Kopelowitz Tsvi3,Sach Benjamin4,Vildhøj Hjalte Wedel1

Affiliation:

1. Technical University of Denmark, DTU Compute, Lyngby, Denmark

2. TU Dortmund, Department of Computer Science

3. Weizmann Institute of Science, Faculty of Mathematics and Computer Science, Rehovot, Israel

4. University of Bristol, Department of Computer Science, Merchant Venturer's Building, United Kingdom

Abstract

In this work, we present efficient algorithms for constructing sparse suffix trees, sparse suffix arrays, and sparse position heaps for b arbitrary positions of a text T of length n while using only O ( b ) words of space during the construction. Attempts at breaking the naïve bound of Ω( nb ) time for constructing sparse suffix trees in O ( b ) space can be traced back to the origins of string indexing in 1968. First results were not obtained until 1996, but only for the case in which the b suffixes were evenly spaced in T . In this article, there is no constraint on the locations of the suffixes. Our main contribution is to show that the sparse suffix tree (and array) can be constructed in O ( n log 2 b ) time. To achieve this, we develop a technique that allows one to efficiently answer b longest common prefix queries on suffixes of T , using only O ( b ) space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Our first solution is Monte Carlo, and outputs the correct tree with high probability. We then give a Las Vegas algorithm, which also uses O ( b ) space and runs in the same time bounds with high probability when b = O (√ n). Additional trade-offs between space usage and construction time for the Monte Carlo algorithm are given. Finally, we show that, at the expense of slower pattern queries, it is possible to construct sparse position heaps in O ( n + b log b ) time and O ( b ) space.

Funder

Danish Council for Independent Research ∣ Natural Sciences, the Danish Research Council

Advanced Technology Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Mathematics (miscellaneous)

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3