B-Trees and Cache-Oblivious B-Trees with Different-Sized Atomic Keys-Reference-Cited by-同舟云学术

B-Trees and Cache-Oblivious B-Trees with Different-Sized Atomic Keys

Published:2016-08-08 Issue:3 Volume:41 Page:1-33
ISSN:0362-5915
Container-title:ACM Transactions on Database Systems
language:en
Short-container-title:ACM Trans. Database Syst.

Author:

Bender Michael A.¹,Ebrahimi Roozbeh²,Hu Haodong³,Kuszmaul Bradley C.⁴

Affiliation:

1. Stony Brook University, Stony Brook, NY

2. Google Inc., CA, USA

3. Shanghai University of Finance and Economics, Shanghai, China

4. MIT CSAIL, MA, USA

Abstract

Most B-tree articles assume that all N keys have the same size K , that f = B / K keys fit in a disk block, and therefore that the search cost is O (log f + 1 N ) block transfers. When keys have variable size, B-tree operations have no nontrivial performance guarantees, however. This article provides B-tree-like performance guarantees on dictionaries that contain keys of different sizes in a model in which keys must be stored and compared as opaque objects. The resulting atomic-key dictionaries exhibit performance bounds in terms of the average key size and match the bounds when all keys are the same size. Atomic-key dictionaries can be built with minimal modification to the B-tree structure, simply by choosing the pivot keys properly. This article describes both static and dynamic atomic-key dictionaries. In the static case, if there are N keys with average size K , the search cost is O (⌈ K / B ⌉log 1 + ⌈ B / K ⌉ N ) expected transfers. It is not possible to transform these expected bounds into worst-case bounds. The cost to build the tree is O ( NK ) operations and O ( NK/B ) transfers if all keys are presented in sorted order. If not, the cost is the sorting cost. For the dynamic dictionaries, the amortized cost to insert a key κ of arbitrary length at an arbitrary rank is dominated by the cost to search for κ. Specifically, the amortized cost to insert a key κ of arbitrary length and random rank is O (⌈ K / B ⌉log 1 + ⌈ B / K ⌉ N + |κ|/ B ) transfers. A dynamic-programming algorithm is shown for constructing a search tree with minimal expected cost. This article also gives a cache-oblivious static atomic-key B-tree, which achieves the same asymptotic performance as the static B-tree dictionary, mentioned previously. A cache-oblivious data structure or algorithm is not parameterized by the block size B or memory size M in the memory hierarchy; rather, it is universal, working simultaneously for all possible values of B or M . On a machine with block size B , if there are N keys with average size K , search operations costs O (⌈ K / B ⌉log 1 + ⌈ B / K ⌉ N ) block transfers in expectation. This cache-oblivious layout can be built in O ( N log( NK )) processor operations.

Funder

National Science Foundation

U.S. Department of Energy

Singapore-MIT Alliance for Research and Technology Centre

Sandia National Laboratories

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/2907945

Reference37 articles.

1. The input/output complexity of sorting and related problems

2. Organization and maintenance of large ordered indexes

3. PrefixB-trees

4. A new algorithm for the construction of optimal B-trees;Becker Peter;Nordic J. Comput.,1994

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Layered List Labeling;Proceedings of the ACM on Management of Data;2024-05-10