Robust and Efficient Sorting with Offset-value Coding-Reference-Cited by-同舟云学术

Robust and Efficient Sorting with Offset-value Coding

Published:2023-03-13 Issue:1 Volume:48 Page:1-23
ISSN:0362-5915
Container-title:ACM Transactions on Database Systems
language:en
Short-container-title:ACM Trans. Database Syst.

Author:

Do Thanh¹^ORCID,Graefe Goetz²^ORCID

Affiliation:

1. Celonis Inc., New York, NY

2. Google Inc., Madison, WI, USA

Abstract

Sorting and searching are large parts of database query processing, e.g., in the forms of index creation, index maintenance, and index lookup, and comparing pairs of keys is a substantial part of the effort in sorting and searching. We have worked on simple, efficient implementations of decades-old, neglected, effective techniques for fast comparisons and fast sorting, in particular offset-value coding. In the process, we happened upon its mutually beneficial relationship with prefix truncation in run files as well as the duality of compression techniques in row- and column-format storage structures, namely prefix truncation and run-length encoding of leading key columns. We also found a beneficial relationship with consumers of sorted streams, e.g., merging parallel streams, in-stream aggregation, and merge join. We report on our implementation in the context of Google’s Napa and F1 Query systems as well as an experimental evaluation of performance and scalability.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3570956

Reference30 articles.

1. Napa

2. R. Bayer and E. McCreight. 1970. Organization and maintenance of large ordered indices. In Proceedings of the 1970 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control (SIGFIDET’70). ACM, New York, NY, 107–141. 10.1145/1734663.1734671

3. PrefixB-trees

4. Duplicate record elimination in large data files

5. Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. 2010. FlumeJava: Easy, efficient data-parallel pipelines. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, New York, NY, 363–375. 10.1145/1806596.1806638

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. More Modern B-Tree Techniques;Foundations and Trends® in Databases;2024