B-Trees and Cache-Oblivious B-Trees with Different-Sized Atomic Keys

Author:

Bender Michael A.1,Ebrahimi Roozbeh2,Hu Haodong3,Kuszmaul Bradley C.4

Affiliation:

1. Stony Brook University, Stony Brook, NY

2. Google Inc., CA, USA

3. Shanghai University of Finance and Economics, Shanghai, China

4. MIT CSAIL, MA, USA

Abstract

Most B-tree articles assume that all N keys have the same size K , that f = B / K keys fit in a disk block, and therefore that the search cost is O (log f + 1 N ) block transfers. When keys have variable size, B-tree operations have no nontrivial performance guarantees, however. This article provides B-tree-like performance guarantees on dictionaries that contain keys of different sizes in a model in which keys must be stored and compared as opaque objects. The resulting atomic-key dictionaries exhibit performance bounds in terms of the average key size and match the bounds when all keys are the same size. Atomic-key dictionaries can be built with minimal modification to the B-tree structure, simply by choosing the pivot keys properly. This article describes both static and dynamic atomic-key dictionaries. In the static case, if there are N keys with average size K , the search cost is O (⌈ K / B ⌉log 1 + ⌈ B / K N ) expected transfers. It is not possible to transform these expected bounds into worst-case bounds. The cost to build the tree is O ( NK ) operations and O ( NK/B ) transfers if all keys are presented in sorted order. If not, the cost is the sorting cost. For the dynamic dictionaries, the amortized cost to insert a key κ of arbitrary length at an arbitrary rank is dominated by the cost to search for κ. Specifically, the amortized cost to insert a key κ of arbitrary length and random rank is O (⌈ K / B ⌉log 1 + ⌈ B / K N + |κ|/ B ) transfers. A dynamic-programming algorithm is shown for constructing a search tree with minimal expected cost. This article also gives a cache-oblivious static atomic-key B-tree, which achieves the same asymptotic performance as the static B-tree dictionary, mentioned previously. A cache-oblivious data structure or algorithm is not parameterized by the block size B or memory size M in the memory hierarchy; rather, it is universal, working simultaneously for all possible values of B or M . On a machine with block size B , if there are N keys with average size K , search operations costs O (⌈ K / B ⌉log 1 + ⌈ B / K N ) block transfers in expectation. This cache-oblivious layout can be built in O ( N log( NK )) processor operations.

Funder

National Science Foundation

U.S. Department of Energy

Singapore-MIT Alliance for Research and Technology Centre

Sandia National Laboratories

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems

Reference37 articles.

1. The input/output complexity of sorting and related problems

2. Organization and maintenance of large ordered indexes

3. PrefixB-trees

4. A new algorithm for the construction of optimal B-trees;Becker Peter;Nordic J. Comput.,1994

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Layered List Labeling;Proceedings of the ACM on Management of Data;2024-05-10

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3