A data structure for a sequence of string accesses in external memory

Author:

Ciriani Valentina1,Ferragina Paolo2,Luccio Fabrizio2,Muthukrishnan S.3

Affiliation:

1. University of Milano, Via Bramante, Crema

2. University of Pisa, Largo Pontecorro, Pisa

3. Rutgers University, Piscataway, NJ

Abstract

We introduce a new paradigm for querying strings in external memory, suited to the execution of sequences of operations. Formally, given a dictionary of n strings S 1 , …, S n , we aim at supporting a search sequence for m not necessarily distinct strings T 1 , T 2 , …, T m , as well as inserting and deleting individual strings. The dictionary is stored on disk, where each access to a disk page fetches B items, the cost of an operation is the number of pages accessed (I/Os), and efficiency must be attained on entire sequences of string operations rather than on individual ones. Our approach relies on a novel and conceptually simple self-adjusting data structure (SASL) based on skip lists, that is also interesting per se . The search for the whole sequence T 1 , T 2 , …, T m can be done in an expected number of I/Os: O (∑ j =1 m | T j |/ B + ∑ i =1 n n ( n i log B m / n i )), where each T j may or may not be present in the dictionary, and n i is the number of times S i is queried (i.e., the number of T j s equal to S i ). Moreover, inserting or deleting a string S i takes an expected amortized number O (| S i |/ B + log B n ) of I/Os. The term ∑ j =1 m | T j |/ B in the search formula is a lower bound for reading the input, and the term ∑ i =1 n n i log B m / n i (entropy of the query sequence) is a standard information-theoretic lower bound. We regard this result as the static optimality theorem for external-memory string access , as compared to Sleator and Tarjan's classical theorem for numerical dictionaries [Sleator and Tarjan 1985]. Finally, we reformulate the search bound if a cache is available, taking advantage of common prefixes among the strings examined in the search.

Publisher

Association for Computing Machinery (ACM)

Subject

Mathematics (miscellaneous)

Reference32 articles.

1. Ailamaki A. DeWitt D. J. Hill M. D. and Wood D. A. 1999. DBMSs on a modern processor: Where does time go? In Proceedings of the 25th International Conference on Very Large Data Bases. 266--277. Ailamaki A. DeWitt D. J. Hill M. D. and Wood D. A. 1999. DBMSs on a modern processor: Where does time go? In Proceedings of the 25th International Conference on Very Large Data Bases. 266--277.

2. Static Optimality and Dynamic Search-Optimality in Lists and Trees

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. SeedTree: A Dynamically Optimal and Local Self-Adjusting Tree;IEEE INFOCOM 2023 - IEEE Conference on Computer Communications;2023-05-17

2. Working Set Theorems for Routing in Self-Adjusting Skip List Networks;IEEE INFOCOM 2020 - IEEE Conference on Computer Communications;2020-07

3. Text Indexing;Encyclopedia of Algorithms;2016

4. A distribution-sensitive dictionary with low space overhead;Journal of Discrete Algorithms;2012-01

5. B-tries for disk-based string management;The VLDB Journal;2008-03-11

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3