Affiliation:
1. Stony Brook University, Stony Brook, NY
2. Google Inc., CA, USA
3. Shanghai University of Finance and Economics, Shanghai, China
4. MIT CSAIL, MA, USA
Abstract
Most B-tree articles assume that all
N
keys have the same size
K
, that
f
=
B
/
K
keys fit in a disk block, and therefore that the search cost is
O
(log
f
+ 1
N
) block transfers. When keys have variable size, B-tree operations have no nontrivial performance guarantees, however.
This article provides B-tree-like performance guarantees on dictionaries that contain keys of different sizes in a model in which keys must be stored and compared as opaque objects. The resulting
atomic-key dictionaries
exhibit performance bounds in terms of the average key size and match the bounds when all keys are the same size. Atomic-key dictionaries can be built with minimal modification to the B-tree structure, simply by choosing the pivot keys properly.
This article describes both static and dynamic atomic-key dictionaries. In the static case, if there are
N
keys with average size
K
, the search cost is
O
(⌈
K
/
B
⌉log
1 + ⌈
B
/
K
⌉
N
) expected transfers. It is not possible to transform these expected bounds into worst-case bounds. The cost to build the tree is
O
(
NK
) operations and
O
(
NK/B
) transfers if all keys are presented in sorted order. If not, the cost is the sorting cost.
For the dynamic dictionaries, the amortized cost to insert a key κ of arbitrary length at an arbitrary rank is dominated by the cost to search for κ. Specifically, the amortized cost to insert a key κ of arbitrary length and random rank is
O
(⌈
K
/
B
⌉log
1 + ⌈
B
/
K
⌉
N
+ |κ|/
B
) transfers. A dynamic-programming algorithm is shown for constructing a search tree with minimal expected cost.
This article also gives a cache-oblivious static atomic-key B-tree, which achieves the same asymptotic performance as the static B-tree dictionary, mentioned previously. A cache-oblivious data structure or algorithm is not parameterized by the block size
B
or memory size
M
in the memory hierarchy; rather, it is universal, working simultaneously for all possible values of
B
or
M
. On a machine with block size
B
, if there are
N
keys with average size
K
, search operations costs
O
(⌈
K
/
B
⌉log
1 + ⌈
B
/
K
⌉
N
) block transfers in expectation. This cache-oblivious layout can be built in
O
(
N
log(
NK
)) processor operations.
Funder
National Science Foundation
U.S. Department of Energy
Singapore-MIT Alliance for Research and Technology Centre
Sandia National Laboratories
Publisher
Association for Computing Machinery (ACM)
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Layered List Labeling;Proceedings of the ACM on Management of Data;2024-05-10